FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json'

thaisnang commented 4 years ago

For some reason, the config file is not getting dumped in the folder. I have tried changing folder permissions but no help.

tanaysoni commented 4 years ago

Hi @thaisnang, could you provide more details on what code/tutorial you're running and the full error stack trace that you get?

thaisnang commented 4 years ago

I ran tutorial 1 with FARMReader. It ran the first time then I tried again but this time with the offline model (same model base RoBERTa). After that it always gives the following:- 05/19/2020 14:16:31 - INFO - elasticsearch - PUT http://localhost:9200/document [status:400 request:0.004s] 05/19/2020 14:16:31 - INFO - haystack.indexing.io - Found data stored in data/article_txt_got. Delete this first if you really want to fetch new data. 05/19/2020 14:16:31 - INFO - elasticsearch - POST http://localhost:9200/_count [status:200 request:0.536s]

05/19/2020 14:16:52 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.665s] 05/19/2020 14:16:53 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.399s] 05/19/2020 14:16:53 - INFO - haystack.indexing.io - Wrote 517 docs to DB 05/19/2020 14:16:53 - INFO - farm.utils - device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None 05/19/2020 14:17:04 - WARNING - farm.modeling.language_model - Could not automatically detect from language model name what language it is. We guess it's an ENGLISH model ... If not: Init the language model by supplying the 'language' param. Traceback (most recent call last): File "Tutorial1_Basic_QA_Pipeline.py", line 123, in reader = FARMReader(model_name_or_path="roberta-base-squad2", use_gpu=True) File "/home/imsai/.local/lib/python3.6/site-packages/haystack/reader/farm.py", line 86, in init doc_stride=doc_stride, num_processes=num_processes) File "/home/imsai/.local/lib/python3.6/site-packages/farm/infer.py", line 194, in load processor = Processor.load_from_dir(model_name_or_path) File "/home/imsai/.local/lib/python3.6/site-packages/farm/data_handler/processor.py", line 182, in load_from_dir config = json.load(open(processor_config_file)) FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json'

I tried with transformer as well it gives the following--

05/19/2020 14:22:30 - INFO - elasticsearch - PUT http://localhost:9200/document [status:400 request:0.036s] 05/19/2020 14:22:30 - INFO - haystack.indexing.io - Found data stored in data/article_txt_got. Delete this first if you really want to fetch new data. 05/19/2020 14:22:30 - INFO - elasticsearch - POST http://localhost:9200/_count [status:200 request:0.004s] 05/19/2020 14:22:30 - INFO - haystack.indexing.io - Skip writing documents since DB already contains 517 docs ... (Disable only_empty_db, if you want to add docs anyway.) 05/19/2020 14:22:38 - INFO - elasticsearch - POST http://localhost:9200/document/_search [status:200 request:0.318s] 05/19/2020 14:22:38 - INFO - haystack.retriever.elasticsearch - Got 10 candidates from retriever 05/19/2020 14:22:38 - INFO - haystack.finder - Reader is looking for detailed answer in 362347 chars ... convert squad examples to features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7.14it/s] add example index and unique id: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5629.94it/s] Traceback (most recent call last): File "Tutorial1_Basic_QA_Pipeline.py", line 140, in prediction = finder.get_answers(question="Who is the father of Arya Stark?", top_k_retriever=10, top_k_reader=5) File "/home/imsai/.local/lib/python3.6/site-packages/haystack/finder.py", line 45, in get_answers top_k=top_k_reader) File "/home/imsai/.local/lib/python3.6/site-packages/haystack/reader/transformers.py", line 77, in predict predictions = self.model(query, topk=self.n_best_per_passage) File "/home/imsai/.local/lib/python3.6/site-packages/transformers/pipelines.py", line 1042, in call for s, e, score in zip(starts, ends, scores) File "/home/imsai/.local/lib/python3.6/site-packages/transformers/pipelines.py", line 1042, in for s, e, score in zip(starts, ends, scores) KeyError: 0

thaisnang commented 4 years ago

Note:- I did not download the RoBERTa separately I just renamed the files from the cache it automatically downloaded. I have renamed them properly that I am sure of. Hopefully, this is not affecting it.

tanaysoni commented 4 years ago

Hi @thaisnang, by default, the models are cached and are not re-downloaded on every execution. If that doesn't fit your workflow, I am curious to know more on how you plan to use the save(offline) functionality.

Here's how you can save the model of a FARMReader:

reader.inferencer.save("path-to-save")

and load it again by supplying the path:

reader = FARMReader(model_name_or_path="path-to-save")

thaisnang commented 4 years ago

Actually I saw the model was downloading again when I ran it the second time. So I thought instead of downloading every execution why don't I just copy the cached model and properly rename it and use it as an offline model. And that's what I did, it should not interfere with the function right?

thaisnang commented 4 years ago

OK, I downloaded again and this time the model did not redownload it was using the cached model. And the model was saved as well. Thanks.

deepset-ai / haystack

FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json' #115