EleutherAI / project-menu

See the issue board for the current status of active and prospective projects!
65 stars 4 forks source link

[Project] Flaminglet - Multimodal adapters for extending capabilities of PLMs #50

Closed TheodoreGalanos closed 1 year ago

TheodoreGalanos commented 2 years ago

Flaminglet :)

The idea is simple: try to train tiny flamingo models on top of NeoX and evaluate the model's performance on visual-language tasks. It would probably not do generation but things like captioning, VQA, etc. should be possible. This is akin to a multimodal adapter approach, where we insert cross-attention layers between a visual backbone and a LM, at the LM side.

A new paper along with a newly pretrained model to handle novel tasks

Major milestones would be to (1) adjust the NeoX codebase to allow for flamingo like training, (2) identify potential ablations and interesting additions to the model architecture, (3) add multimodal benchmarks to eval harness for evaluation (including new multimodal reasoning benchmarks like winoground perhaps), and (4) train a bunch of models.

Ehm, probably need help with everything :) This is just an idea right now and I have little exposure to the NeoX architecture / Flamingo implementation

We would need compute for finetuning a series of models, including ablations. I'm not certain what the requirements would be exactly but certainly less than finetuning a NeoX model. If we followed flamingo, the 'adapter' modules could even be around 850M parameters total, with also 1/2 and 1/4 of that number in possible 'every N layers' ablations. Even smaller initial ratios can be ablated as well.