Open g-karthik opened 7 months ago
The latest transformers can be used to fine-tune BGE-M3.
TypeError: unhashable type: 'dict'
This error is due to the tokenizer_config.json. We have updated the file in huggingface, and you can download it and re-run the code.
Thanks for your response - I unblocked myself on this error by downgrading to transformers 4.34, but I can test with the latest transformers as well. Is the following comment in your README no longer true? If so, perhaps you could remove it from the README.
The new version of Transformers may pose issues for fine-tuning. If you encounter problems, you can try to downgrade to versions 4.33-4.36.
On the PEFT/adapter use-case I mentioned in my issue, I found that I need to make changes to sentence_transformers in my python site-packages in the conda env to ensure the below line runs correctly during fine-tuning. https://github.com/FlagOpen/FlagEmbedding/blob/0fe98ddefbd9d2caced52e94b9f8421d5ba7317f/FlagEmbedding/baai_general_embedding/finetune/trainer.py#L5
But even after doing it, I am unable to load the fine-tuned weights into sentence_transformers. It is looking for a config.json
when the saved checkpoints only has adapter_config.json
.
Do you plan to add support for PEFT for BGE-M3? I am observing catastrophic forgetting with BGE-M3 on some datasets during fine-tuning, so I wanted to try PEFT.
I'm trying to fine-tune BGE-M3 based on the README here: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune
I originally started with the latest transformers version a few weeks ago, and found that the loss and grad norms were spiking during fine-tuning (despite using gradient clipping with
max_grad_norm=1
).Then, I noticed this comment in your README:
So I downgraded to transformers 4.33 and things seemed to work fine. However, when fine-tuning the full model on my dataset, I observed that the performance on my in-domain test set was degrading as compared to using BGE-M3 zero-shot. So I thought I will try to use adapters/LoRA with BGE-M3 and made some code changes to BGE-M3 to try it out. Code changes are based on the "Train a PEFT Adapter" section in this page: https://huggingface.co/docs/transformers/main/en/peft
When testing this code change, I faced an error which indicated that I cannot use/add adapters with transformers 4.33. Therefore, I moved to 4.36 and I am now able to add adapters to the model class. But now I face a different issue, with the tokenizer:
This seems to be a known issue that was reported here and fixed here. If I want to pull this fix, I will need to use a transformers version other than 4.36 now.
Why does the latest transformers not work well with BGE-M3 fine-tuning? Is there some way we can make BGE-M3 fine-tuning work successfully with the latest version of transformers? That will ensure users can get all the latest fixes rather than trying to patch old versions+files in a hacky way. Alternately, what would your recommendation be to help unblock this?
Thank you!