adithya-s-k / YoloGemma

Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detection and segmentation.
MIT License
77 stars 6 forks source link

fine-tune code #1

Open JunMa11 opened 4 months ago

JunMa11 commented 4 months ago

Dear @adithya-s-k ,

Thanks for the great repo.

Do you have any plans to share a fine-tuning script on customized datasets?

adithya-s-k commented 4 months ago

Hey the project is built around Paligemma

Here are some refrences https://ai.google.dev/gemma/docs/paligemma https://huggingface.co/blog/paligemma

Finetuning guide: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/PaliGemma/Fine_tune_PaliGemma_for_image_%3EJSON.ipynb

Hope this helps

JunMa11 commented 4 months ago

Hi @adithya-s-k ,

Thanks for your swift response!

I'll prepare my data as follows:

Would you happen to have any suggested script to convert the bounding box and segmentation mask to tokens (that can be used in Paligemma)?