explosion / curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components
MIT License
863 stars 34 forks source link

Optimal Qlora settings #316

Open KnutJaegersberg opened 1 year ago

KnutJaegersberg commented 1 year ago

In HF transformers, the default setting of qlora does not replicate the qlora of the original paper, leaving valuable performance lying on the ML practitioners street using lib defaults.
One has to apply lora to certain parts of the NN, please see Tweet by Tim Dettmers:

https://twitter.com/Tim_Dettmers/status/1695377756232589459

I guess this has to be customized for each model architecture, sounds like a feature for curated-transformers, to me.

danieldk commented 1 year ago

Thanks for the suggestion! We hope to look more into training in the coming period and will definitely take this into account.