fdschmidt93 / trident-nllb-llm2vec

Repository for "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"
MIT License
10 stars 0 forks source link

Release of Pre-trained models #2

Open ArkadeepAcharya opened 1 week ago

ArkadeepAcharya commented 1 week ago

Request for Release of Pretrained NLLB-LLM2Vec Model

Hello Team,

Could you please release the pretrained NLLB-LLM2Vec models mentioned in your paper on "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"? It would greatly benefit the community by facilitating further research.

Thank you for your contributions.

Best regards, Arkadeep Acharya

fdschmidt93 commented 1 week ago

Hi Arkadeep,

Thanks a lot for your interest in our work!

Yes, I very much plan on making the models available. :)

I am currently working on refining Stage 1, such that Stage 2 won't be necessary. My sincere hope is that I can then release a single pre-trained model which can easily be fine-tuned on any downstream task without task distillation for maximum performance.

In any case, I will make the self-supervised adapted model (S1) of the paper available asap. Unfortunately, directly fine-tuning that will only give you good performance if you have sizable training data (like for NLI, Belebele).

Cheers, Fabian

ArkadeepAcharya commented 1 week ago

Thanks Fabian! Looking forward to the model release!

fdschmidt93 commented 6 days ago

As a quick update. Sharing the model on Hugging Face hub is surprisingly difficult, since it has to be correctly quantized and LoRAfied prior to loading the weights. transformers, peft, bitsandbytes don't play that easily nicely together when setting up an AutoModel.from_pretrained the conventional way. Unfortunately, none of this is really well documented.

Between having been sick and working on the more general model, I haven't yet had sufficient time how to best upload the model in a way that it is most easily used, i.e.

from transformers import AutoModel
model = AutoModel.from_pretrained("fdschmidt/nllb-llm2vec-v0.1")

I might have to package it more generally as an nn.Module (cf. https://huggingface.co/docs/hub/models-uploading#upload-a-pytorch-model-using-huggingfacehub). I'll be on vacation next week but will try to squeeze it in.