Open ArkadeepAcharya opened 5 months ago
Hi Arkadeep,
Thanks a lot for your interest in our work!
Yes, I very much plan on making the models available. :)
I am currently working on refining Stage 1, such that Stage 2 won't be necessary. My sincere hope is that I can then release a single pre-trained model which can easily be fine-tuned on any downstream task without task distillation for maximum performance.
In any case, I will make the self-supervised adapted model (S1) of the paper available asap. Unfortunately, directly fine-tuning that will only give you good performance if you have sizable training data (like for NLI, Belebele).
Cheers, Fabian
Thanks Fabian! Looking forward to the model release!
As a quick update. Sharing the model on Hugging Face hub is surprisingly difficult, since it has to be correctly quantized and LoRAfied prior to loading the weights. transformers
, peft
, bitsandbytes
don't play that easily nicely together when setting up an AutoModel.from_pretrained
the conventional way. Unfortunately, none of this is really well documented.
Between having been sick and working on the more general model, I haven't yet had sufficient time how to best upload the model in a way that it is most easily used, i.e.
from transformers import AutoModel
model = AutoModel.from_pretrained("fdschmidt/nllb-llm2vec-v0.1")
I might have to package it more generally as an nn.Module
(cf. https://huggingface.co/docs/hub/models-uploading#upload-a-pytorch-model-using-huggingfacehub). I'll be on vacation next week but will try to squeeze it in.
Hey Fabian!
I know you are probably busy building some exciting stuff but have you had the chance to upload the weights? Even a link to S3 or Google drive is much appreciated.
My primary interest is finetuning it further.
Thanks
Hi there,
I'm actually while writing trying to iron out the very last issues ( famous last words :crossed_fingers: ) of an initial release of NLLB-LLM2Vec on Llama 3.1 8B.
That release will support seamless
AutoModel.from_pretrained("fdschmidt93/...")
AutoModelForSequenceClassification.from_pretrained("fdschmidt93/...")
AutoModelForTokenClassification.from_pretrained("fdschmidt93/...")
Unfortunately wrapping true custom models onto the Huggingface Hub is simply a bad developer experience (undocumented and unusual behavior, plus other stuff like https://github.com/huggingface/transformers/pull/33844).
Anyhow, that release should ideally be close in performance of S1+S2 while only doing S1+FT (if that's unclear, please refer to the paper).
https://huggingface.co/fdschmidt93/NLLB-LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse
Here is the model with usage instructions on the README. The model doesn't have that much mileage on it yet, so I'd kindly ask you to report any issues you run into it quickly and I'll fix them ASAP :)
@fdschmidt93 Thanks a lot!
Hi @fdschmidt93, Can you please clarify if this is a stage-1 trained model or if the model has gone through both Stage-1 and stage-2 training?
Hi @ArkadeepAcharya
As stated on the readme of the model, this version has yet to be fine tuned for a downstream task. Hence, the model has only been trained for stage 1. It nevertheless should notably perform better than stage 1 in the paper as it has been trained
I unfortunately won't be releasing stage 2 models as I will not be finetuning models per task due to lack of time and compute.
There would be an argument to do something like GritLM (cf. Paper) and then do distillation to have a single model for 'all tasks', but I don't have the capacity (GPUs, time) to do that. I invested a lot of time in improving the self-supervised stage as much as possible.
NLLB-LLM2Vec should be used if you need sequence level embeddings for less resourced languages that industry-level models (samples, supervision, etc.) like NVEmbed, GritLM, E5, BGE don't cover. Or in actually academic settings where you want to be more sure that the task has not leaked (albeit instruction finetuning of Llama itself may have leakage).
I hope this clarifies any questions you might have. Let me know if there's more follow up you would like to discuss.
Request for Release of Pretrained NLLB-LLM2Vec Model
Hello Team,
Could you please release the pretrained NLLB-LLM2Vec models mentioned in your paper on "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"? It would greatly benefit the community by facilitating further research.
Thank you for your contributions.
Best regards, Arkadeep Acharya