deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Fine-Tuned Transformers model conversion failed using Converter.convert_from_transformers() #839

Closed PhaneendraGunda closed 2 years ago

PhaneendraGunda commented 2 years ago

Describe the bug FARM conversion module Converter.convert_from_transformers is failed to load the fine-tuned Transformer model from local storage. Language Model from the FARM is looking for language_model_config.json which is not available in fine-tuned HuggingFace Transformer model.

def convert_from_transformers():
    # Load model from transformers model hub (-> continue training / compare models / ...)
    model = Converter.convert_from_transformers(
        model_name_or_path='../models/fquad/checkpoint-19500',
        device=device,
        task_type='question_answering'
    )

    model.save(Path("../models/fquad/haystack_checkpoint-19500"))

Error message 08/27/2021 17:27:58 - INFO - farm.modeling.language_model - LOADING MODEL 08/27/2021 17:27:58 - INFO - farm.modeling.language_model - ============= 08/27/2021 17:28:08 - INFO - farm.modeling.language_model - Could not find ../models/fquad/checkpoint-19500 locally. 08/27/2021 17:28:09 - INFO - farm.modeling.language_model - Looking on Transformers Model Hub (in local cache and online)...

Find the snippet of the Language Model Load() from FARM

image

Expected code

config_file = Path(pretrained_model_name_or_path) / "**config.json**"
        if os.path.exists(config_file):
            logger.info(f"Model found locally at {pretrained_model_name_or_path}")
            # it's a local directory in FARM format
            config = json.load(open(config_file))
            language_model = cls.subclasses[config["**_name_or_path**"]].load(pretrained_model_name_or_path)

System:

julian-risch commented 2 years ago

Hi @PhaneendraGunda you've already found the source of the error. FARM currently only supports loading from a local directory if the stored model is in FARM format. That would work for you if you load a model from the model hub and then finetune it with FARM. However, it currently does not work out-of-the-box if you finetune the model with transformers outside of FARM. I think renaming the file path to config_file = Path(pretrained_model_name_or_path) / "config.json" in the following line of code will not solve the problem: https://github.com/deepset-ai/FARM/blob/004ed2c6dbd9d7a2af85d57ec89a689b9363e6c1/farm/modeling/language_model.py#L138 The reason is that FARM format is slightly different than the format used on the model hub. For example, FARM uses a field called name and accesses it here: https://github.com/deepset-ai/FARM/blob/004ed2c6dbd9d7a2af85d57ec89a689b9363e6c1/farm/modeling/language_model.py#L143

If you are interested, you're very much welcome to change the code to allow loading a local model that is not in FARM format. Could you imagine making a contribution here?

PhaneendraGunda commented 2 years ago

Make sense. I would love to update the code to load fine-tuned Transformers model from the local path. Let me explore FARM library further as I am new to this. Anyway we can discuss further in Haystack Slack channel if you there. Thank you for your time.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.