TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)
MIT License
425 stars 194 forks source link

Feature request #19

Closed lucasjinreal closed 4 months ago

lucasjinreal commented 5 months ago

Hi, thanks for the versatile repo for combination of different VE and LLMs,

However, I found there seems haven't any compare with Resampler and MLP's

Would like add such a experiment to compare with Resampler and MLPs?

siddk commented 4 months ago

Hey @lucasjinreal -- sorry that I haven't responded to this sooner! It should be fairly straightforward to add new "vision-to-language" projection layers; you should just need to override the logic in the PrismaticVLM class.

Let me know if you have any trouble!