deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.58k stars 1.82k forks source link

FARMReader Cannot use inconsistently named model #140

Closed klys-equinix closed 4 years ago

klys-equinix commented 4 years ago

I am trying to use https://huggingface.co/henryk/bert-base-multilingual-cased-finetuned-polish-squad2 model in the FARMReader. The model is named rather unfortunately, as it's name contains both 'mutlilingual' and 'polish'. As a result, i get following error:

ValueError: Could not automatically detect from language model name what language it is.
     Found multiple matches: ['polish', 'multilingual']
     Please init the language model by manually supplying the 'language' as a parameter.

Please add an option to pass model language as arguments. Here is full stacktrace:

<ipython-input-25-518aa04f83bc> in <module>
      1 document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
----> 2 retriever = FARMReader(model_name_or_path="henryk/bert-base-multilingual-cased-finetuned-polish-squad2",
      3                             use_gpu=True, no_ans_boost=-10, context_window_size=500,
      4                             top_k_per_candidate=3, top_k_per_sample=1,
      5                             num_processes=8, max_seq_len=256, doc_stride=128)

~/.local/lib/python3.8/site-packages/haystack/reader/farm.py in __init__(self, model_name_or_path, context_window_size, batch_size, use_gpu, no_ans_boost, top_k_per_candidate, top_k_per_sample, num_processes, max_seq_len, doc_stride)
     85             self.return_no_answers = True
     86         self.top_k_per_candidate = top_k_per_candidate
---> 87         self.inferencer = Inferencer.load(model_name_or_path, batch_size=batch_size, gpu=use_gpu,
     88                                           task_type="question_answering", max_seq_len=max_seq_len,
     89                                           doc_stride=doc_stride, num_processes=num_processes)

~/.local/lib/python3.8/site-packages/farm/infer.py in load(cls, model_name_or_path, batch_size, gpu, task_type, return_class_probs, strict, max_seq_len, doc_stride, extraction_layer, extraction_strategy, s3e_stats, num_processes, disable_tqdm)
    207                                  "'question_answering', 'embeddings', 'text_classification', 'ner'")
    208 
--> 209             model = AdaptiveModel.convert_from_transformers(model_name_or_path, device, task_type)
    210             config = AutoConfig.from_pretrained(model_name_or_path)
    211             tokenizer = Tokenizer.load(model_name_or_path)

~/.local/lib/python3.8/site-packages/farm/modeling/adaptive_model.py in convert_from_transformers(cls, model_name_or_path, device, task_type, processor)
    578         :return: AdaptiveModel
    579         """
--> 580         lm = LanguageModel.load(model_name_or_path)
    581         #TODO Infer type of head automatically from config
    582 

~/.local/lib/python3.8/site-packages/farm/modeling/language_model.py in load(cls, pretrained_model_name_or_path, n_added_tokens, language_model_class, **kwargs)
    143 
    144             if language_model_class:
--> 145                 language_model = cls.subclasses[language_model_class].load(pretrained_model_name_or_path, **kwargs)
    146             else:
    147                 language_model = None

~/.local/lib/python3.8/site-packages/farm/modeling/language_model.py in load(cls, pretrained_model_name_or_path, language, **kwargs)
    389             # Pytorch-transformer Style
    390             bert.model = BertModel.from_pretrained(str(pretrained_model_name_or_path), **kwargs)
--> 391             bert.language = cls._get_or_infer_language_from_name(language, pretrained_model_name_or_path)
    392         return bert
    393 

~/.local/lib/python3.8/site-packages/farm/modeling/language_model.py in _get_or_infer_language_from_name(cls, language, name)
    218             return language
    219         else:
--> 220             return cls._infer_language_from_name(name)
    221 
    222     @classmethod

~/.local/lib/python3.8/site-packages/farm/modeling/language_model.py in _infer_language_from_name(cls, name)
    241             )
    242         elif len(matches) > 1:
--> 243             raise ValueError(
    244                 "Could not automatically detect from language model name what language it is.\n"
    245                 f"\t Found multiple matches: {matches}\n"

ValueError: Could not automatically detect from language model name what language it is.
     Found multiple matches: ['polish', 'multilingual']
     Please init the language model by manually supplying the 'language' as a parameter.
Timoeller commented 4 years ago

This bug is unfortunate and will require some time to be fixed, because we need to update FARM first.

One solution would be to use a TransformersReader instead of FARMReader with:


reader = TransformersReader(
    model="henryk/bert-base-multilingual-cased-finetuned-polish-squad2",
    tokenizer="henryk/bert-base-multilingual-cased-finetuned-polish-squad2", 
    use_gpu=-1)
klys-equinix commented 4 years ago

Thx for the reply. I have been playing around with the TransformerReader too, but in my experience FARMReader is more useful. Will wait for the fix. Also, I have been trying to use FARM directly by

model = LanguageModel.load("henryk/bert-base-multilingual-cased-finetuned-polish-squad2", language='polish')
nlp = Inferencer.load(model, 
               task_type='question_answering', gpu=True)

which obviously didn't work because one can't just pass model to Inferencer.load(). But the Inferencer.load() also does not provide an option to pass a language param so you could use the model without LanguageModel.load. Can you also update the Inferencer.load in FARM for that?

brandenchan commented 4 years ago

Hi, in the latest version of FARM that ValueError that you were getting is now just a warning. Could you install the latest version of Haystack (which in turn installs a later version of FARM) and try again?

klys-equinix commented 4 years ago

Hi, just came back to project. As of now the problem no longer appears, so I am closing the issue