Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.52k stars 241 forks source link

Creating custom use cases based on specific images #297

Open smtabatabaie opened 8 months ago

smtabatabaie commented 8 months ago

Very awesome project, thanks very much. I wanted to ask if it is possible to provide special images for and make the model train on those? for example, creating a manual for a specific tool or machine based on additional visual data. Kinda like RAG but for images. Thanks

Luodian commented 8 months ago

I was considering relaxing the data format and let it supports loading images from local path so you can incrementally add to a pool and let dataloader sample from the pool. (currently it's all made into parquet so not that convenient).

But I am supposed to do it next month since this month there's CVRP ddl and many finals...