FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.32k stars 532 forks source link

transformers version when fine-tuning #609

Open g-karthik opened 7 months ago

g-karthik commented 7 months ago

I'm trying to fine-tune BGE-M3 based on the README here: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune

I originally started with the latest transformers version a few weeks ago, and found that the loss and grad norms were spiking during fine-tuning (despite using gradient clipping with max_grad_norm=1).

Then, I noticed this comment in your README:

The new version of Transformers may pose issues for fine-tuning. If you encounter problems, you can try to downgrade to versions 4.33-4.36.

So I downgraded to transformers 4.33 and things seemed to work fine. However, when fine-tuning the full model on my dataset, I observed that the performance on my in-domain test set was degrading as compared to using BGE-M3 zero-shot. So I thought I will try to use adapters/LoRA with BGE-M3 and made some code changes to BGE-M3 to try it out. Code changes are based on the "Train a PEFT Adapter" section in this page: https://huggingface.co/docs/transformers/main/en/peft

When testing this code change, I faced an error which indicated that I cannot use/add adapters with transformers 4.33. Therefore, I moved to 4.36 and I am now able to add adapters to the model class. But now I face a different issue, with the tokenizer:

  File "/FlagEmbedding/FlagEmbedding/baai_general_embedding/finetune/run.py", line 59, in main
    tokenizer = AutoTokenizer.from_pretrained(
  File "/environments/my_conda_env/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 787, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/environments/my_conda_env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained
    return cls._from_pretrained(
  File "/environments/my_conda_env/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2253, in _from_pretrained
    init_kwargs[key] = added_tokens_map.get(init_kwargs[key], init_kwargs[key])
TypeError: unhashable type: 'dict'

This seems to be a known issue that was reported here and fixed here. If I want to pull this fix, I will need to use a transformers version other than 4.36 now.

Why does the latest transformers not work well with BGE-M3 fine-tuning? Is there some way we can make BGE-M3 fine-tuning work successfully with the latest version of transformers? That will ensure users can get all the latest fixes rather than trying to patch old versions+files in a hacky way. Alternately, what would your recommendation be to help unblock this?

Thank you!

staoxiao commented 7 months ago

The latest transformers can be used to fine-tune BGE-M3. TypeError: unhashable type: 'dict' This error is due to the tokenizer_config.json. We have updated the file in huggingface, and you can download it and re-run the code.

g-karthik commented 7 months ago

Thanks for your response - I unblocked myself on this error by downgrading to transformers 4.34, but I can test with the latest transformers as well. Is the following comment in your README no longer true? If so, perhaps you could remove it from the README.

The new version of Transformers may pose issues for fine-tuning. If you encounter problems, you can try to downgrade to versions 4.33-4.36.

On the PEFT/adapter use-case I mentioned in my issue, I found that I need to make changes to sentence_transformers in my python site-packages in the conda env to ensure the below line runs correctly during fine-tuning. https://github.com/FlagOpen/FlagEmbedding/blob/0fe98ddefbd9d2caced52e94b9f8421d5ba7317f/FlagEmbedding/baai_general_embedding/finetune/trainer.py#L5

But even after doing it, I am unable to load the fine-tuned weights into sentence_transformers. It is looking for a config.json when the saved checkpoints only has adapter_config.json.

Do you plan to add support for PEFT for BGE-M3? I am observing catastrophic forgetting with BGE-M3 on some datasets during fine-tuning, so I wanted to try PEFT.