Open jeffbl opened 2 years ago
Assigned to @rohanakut , but pinging @gp1702 since we discussed this this morning, in case you have anything to add in comments.
Some of them are now missing IIRC, but we have the ones I did last year listed (with objects found) in #213. You can add to list there, or create list using some other mechanism, since this keeps coming up...
(i.e., one option would be to have a script that nightly goes and runs object detection on a set of test images, and if results change, flag you, since it may indicate something broke, and it would also give us an ongoing record of how object detection/semseg change over time as we change models, etc.)
I have tested 2 models which are fairly larger than the current models. I have picked YOLOv5s6 and YOLOv5sl6(can be found here) for my testing. The following was my conclusion.
YOLOv5sl6 is the biggest model currently available. On testing it with images, I found that the output was not very different from the current model. Moreover, this model tends to take around 1.5 secs
per request as against the current model which takes 0.3sec
per request. Hence I have decided to not use this model as there is no significant benefit in including this model in our pipeline.
The second model that I tested was YOLOv5s6. This model also gives results that are very similar to the current model. I have added the results in #213. This model also takes roughly 0.4secs
to run per request. Hence this model can be considered.
@jeffbl please let me know if you feel the need to change the current model considering the output. Pinging @gp1702 as well to weigh in on this topic.
Thanks for testing this. Agree that is a massive performance hit for the largest model, unless there is a very clear quality benefit. Given our testing is very limited, I'd be happy to take the extra 0.1s/query to go with YOLOv5s6, since it may make more of a difference in cases we haven't encountered yet. Do you know if it takes significantly more memory?
@jeffbl I havent tested this on memory consumption as the current object detection models are not running on GPU.
Based on some testing of graphics from our tutorial page, YOLOv5s6 does not seem to be an improvement:
tutorial YOLOv5x (pegasus) YOLOv5s6 (unicorn)
mountain landscape only object found on pegasus (horse) not found with YOLOv5s6
street scene 6 people, 2 umbrellas, 2 traffic lights, 4 cars, a bus, and a backpack. 5 people, 4 cars, a backpack, an umbrella, and a traffic light.
dining 5 bowls, 4 chairs, 2 vases, 4 wine glasses, 4 people, a cup, a potted plant, a couch, a dining table, and a bottle 3 chairs 5 people 1 potted plant 1 bottle 1 bowl and 1 dining table
The hypothesis is that since the YOLOv5s6 model has far fewer parameters than the in-production YOLOv5x, it struggles to identify objects. I'd say based on these examples, it would not be a win to move from YOLOv5x to YOLOv5s6. We could try a larger high-resolution model, but gains may not be huge anyway.
To really do a thorough job, we would need to identify a test set of representative graphics, and have a script or something that output objects found for various model sizes. However, after discussion with @rohanakut I'm postponing this work item to post CSUN (backlog), when it should be considered alongside moving to Azure or another object detection solution.
Assigning to @rianadutta since this is about the server-side tradeoffs of quality vs. latency/memory.
209 moved YOLO to a larger model, but still at the smaller image size. This work item is to also investigate whether it is worth the additional compute time/memory to move to the larger image dimension models. The hope would be that when running on pegasus with the 3090, that it is still very fast in terms of real time. Note that #213 has links to photos that were tested on the original small YOLO model, as well as the larger one, so even if not comprehensive, can be used as a basis of comparison.