Shared-Reality-Lab / IMAGE-server

IMAGE project server components
Other
2 stars 7 forks source link

Investigate moving to larger image size for YOLO #247

Open jeffbl opened 2 years ago

jeffbl commented 2 years ago

209 moved YOLO to a larger model, but still at the smaller image size. This work item is to also investigate whether it is worth the additional compute time/memory to move to the larger image dimension models. The hope would be that when running on pegasus with the 3090, that it is still very fast in terms of real time. Note that #213 has links to photos that were tested on the original small YOLO model, as well as the larger one, so even if not comprehensive, can be used as a basis of comparison.

jeffbl commented 2 years ago

Assigned to @rohanakut , but pinging @gp1702 since we discussed this this morning, in case you have anything to add in comments.

jeffbl commented 2 years ago

Some of them are now missing IIRC, but we have the ones I did last year listed (with objects found) in #213. You can add to list there, or create list using some other mechanism, since this keeps coming up...

jeffbl commented 2 years ago

(i.e., one option would be to have a script that nightly goes and runs object detection on a set of test images, and if results change, flag you, since it may indicate something broke, and it would also give us an ongoing record of how object detection/semseg change over time as we change models, etc.)

rohanakut commented 2 years ago

I have tested 2 models which are fairly larger than the current models. I have picked YOLOv5s6 and YOLOv5sl6(can be found here) for my testing. The following was my conclusion.

YOLOv5sl6 is the biggest model currently available. On testing it with images, I found that the output was not very different from the current model. Moreover, this model tends to take around 1.5 secs per request as against the current model which takes 0.3sec per request. Hence I have decided to not use this model as there is no significant benefit in including this model in our pipeline.

The second model that I tested was YOLOv5s6. This model also gives results that are very similar to the current model. I have added the results in #213. This model also takes roughly 0.4secs to run per request. Hence this model can be considered.

@jeffbl please let me know if you feel the need to change the current model considering the output. Pinging @gp1702 as well to weigh in on this topic.

jeffbl commented 2 years ago

Thanks for testing this. Agree that is a massive performance hit for the largest model, unless there is a very clear quality benefit. Given our testing is very limited, I'd be happy to take the extra 0.1s/query to go with YOLOv5s6, since it may make more of a difference in cases we haven't encountered yet. Do you know if it takes significantly more memory?

rohanakut commented 2 years ago

@jeffbl I havent tested this on memory consumption as the current object detection models are not running on GPU.

jeffbl commented 2 years ago

Based on some testing of graphics from our tutorial page, YOLOv5s6 does not seem to be an improvement:

tutorial YOLOv5x (pegasus) YOLOv5s6 (unicorn)

mountain landscape only object found on pegasus (horse) not found with YOLOv5s6

street scene 6 people, 2 umbrellas, 2 traffic lights, 4 cars, a bus, and a backpack. 5 people, 4 cars, a backpack, an umbrella, and a traffic light.

dining 5 bowls, 4 chairs, 2 vases, 4 wine glasses, 4 people, a cup, a potted plant, a couch, a dining table, and a bottle 3 chairs 5 people 1 potted plant 1 bottle 1 bowl and 1 dining table

The hypothesis is that since the YOLOv5s6 model has far fewer parameters than the in-production YOLOv5x, it struggles to identify objects. I'd say based on these examples, it would not be a win to move from YOLOv5x to YOLOv5s6. We could try a larger high-resolution model, but gains may not be huge anyway.

To really do a thorough job, we would need to identify a test set of representative graphics, and have a script or something that output objects found for various model sizes. However, after discussion with @rohanakut I'm postponing this work item to post CSUN (backlog), when it should be considered alongside moving to Azure or another object detection solution.

jeffbl commented 2 years ago

Assigning to @rianadutta since this is about the server-side tradeoffs of quality vs. latency/memory.