Closed lucasjinreal closed 4 months ago
Hey @lucasjinreal -- sorry that I haven't responded to this sooner! It should be fairly straightforward to add new "vision-to-language" projection layers; you should just need to override the logic in the PrismaticVLM
class.
Let me know if you have any trouble!
Hi, thanks for the versatile repo for combination of different VE and LLMs,
However, I found there seems haven't any compare with Resampler and MLP's
Would like add such a experiment to compare with Resampler and MLPs?