adapter-hub / adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning
https://docs.adapterhub.ml
Apache License 2.0
2.52k stars 336 forks source link

Adapter Configuration - Expert of Mixture #626

Closed simon-lund closed 8 months ago

simon-lund commented 8 months ago

Hello,

I am a student of computer science at the LMU Munich and I am working on fine-tuning vision-language models in a multitask setting as part of my master thesis. The focus of my work is on Mixture of Expert models, which select and combine relevant adapters for different inputs.

During my research, I came across your adapter framework. I have already read some of the documentation, but I am still unsure whether it would be possible to “simply” add such a meta module. I would like to contribute to this framework with such a module, but I would appreciate a little help on where to start.

Beyond that, I would like to ask whether existing models can be extended. Specifically, a combination of CLIP + Llama would allow me to rebuild the code of a paper (Octavius).

calpt commented 8 months ago

Hey @simon-lund!

Thanks for reaching out, this sounds very interesting. We'd be happy to see your work integrated into our library. Haven't studied your topic and the linked paper in detail yet, so these are only preliminary high-level answers:

During my research, I came across your adapter framework. I have already read some of the documentation, but I am still unsure whether it would be possible to “simply” add such a meta module. I would like to contribute to this framework with such a module, but I would appreciate a little help on where to start.

Yes, in principle it should be possible to integrate such new modules into the library. I think what you're trying to achieve most closely resembles a new composition block and could be implemented as a new composition block type, so I'll try to give some context here:

Beyond that, I would like to ask whether existing models can be extended. Specifically, a combination of CLIP + Llama would allow me to rebuild the code of a paper (Octavius).

In general, both CLIP and Llama are already supported by our library. In principle it should be possible to compose these models using standard methods provided in Transformers, e.g. joining vision encoders and text decoders as described here. This might require some additional tweaking to work well with adapters though.

In general, our contributing guides on adding new model support and adding new adapter methods might also provide helpful context to this regard.

Hope this is somewhat helpful as general pointers. Happy to help with more specific questions/ issues related to your concrete use case!