RoBorregos / home-vision

Computer Vision resources and packages for the Robocup@Home Competition
GNU General Public License v3.0
0 stars 0 forks source link

Vision - Language Model (Moondream2) #31

Open EmilianoHFlores opened 2 months ago

EmilianoHFlores commented 2 months ago

Some tasks from @Home require specific image information that ranges from person descriptions to finding certain details within an image (e.g. how many people are raising their hand). These tasks either require a whole script all for themselves or a trained model. Recent Vision-LLM models such as GPT-Vision or Moondream can handle these tasks.

This issue will address Docker + ROS integration of Moondream2 as an alternative for handling these problems, running the model on an Nvidia Jetson Xavier.