deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Workaround transformers overwriting model_type when saving dpr models #765

Closed julian-risch closed 3 years ago

julian-risch commented 3 years ago

For DPR models, transformers overwrites the model_type parameter specified in a model's config when saving the model: https://github.com/huggingface/transformers/blob/64e78564a519cda2b4408803e2781c604e1e3bdd/src/transformers/configuration_utils.py#L626 For DPR models with a non-BERT tokenizer, e.g., CamembertTokenizer as in haystack issue #1046 this prevents loading the correct tokenizer when the model is loaded again. For example, the model_type camembert is specified in the config when loading the model from the model hub, but the model_type dpr is saved. To overcome this problem, we set the model_type of transformers.DPRConfig on-the-fly to the model_type as specified in the model's config file.

This PR corresponds to haystack PR #1060

Timoeller commented 3 years ago

:heart: ;)