base64 image encoding and visual model differences for training and inference （dataset && evaluation）

Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

https://otter-ntu.github.io/

MIT License

3.54k stars 242 forks source link

base64 image encoding and visual model differences for training and inference （dataset && evaluation） #262

Closed xmc-andy closed 11 months ago

xmc-andy commented 11 months ago

Thank you very much for your work! I have one question that I hope can be answered. The question is why base64 image encoding is used, what is the reason for this design? In my practice, using json to store base64 encoding in the preprocessing stage will increase memory usage.