SWAMP-Blimps / CatchingBlimp

4 stars 3 forks source link

Prototype YOLO training pipeline using Florence-2 as dataset generator #30

Open sah4jpatel opened 5 days ago

sah4jpatel commented 5 days ago

Florence-2 is a new VLM that is capable of very accurately finding specific objects within images such as "purple balloon," "red blimp," "yellow square" even in terrible conditions such as low light, poor angle, and incomplete or partial object presence. Using this tool in real-time is likely not possible yet given that the ONNX support is still a bit too unrefined to work with RKNN. Even if RKNN worked, this model likely couldn't run in real-time. YOLO V8 or V10 might be optimal models that can be trained on this generated dataset.

Proposed plan:

graph LR;
    A["Split Competition Dataset\nFootage into Frames"]-->B["Process Images with Florence-2\n with OPEN_VOCABULARY_DETECTION tag"];
    B-->C["Feed Image-BBox dataset into YOLO"];
sah4jpatel commented 5 days ago

thoughts? @Rhyme0730

Rhyme0730 commented 4 days ago

What's the meaning of 'Even if RKNN worked, this model likely couldn't run in real-time' ? So you mean we can use this VLM to train our model offline in a better way, and Yolov8 or Yolov10 can be compatible with this VLM model. Is that right?

sah4jpatel commented 4 days ago

Yea we can just use a VLM as a way of generating our dataset which can be used to train a YOLO model.

I can build out that dataset generation script once I get the footage we recorded from last competition. Even the smallest VLMs I've seen can't run on SBCs in a way where the inference time is < 1 second even.

Rhyme0730 commented 4 days ago

I see, so the trained yolo model can be effective even in the terrible conditions like strong light. It's worth a try and Opi is good enough to run a yolov8/v10 model. Btw do you have past dataset now? I also want to get them to train a yolov5 model and see the performance.