deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.73k stars 1.92k forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json' #115

Closed thaisnang closed 4 years ago

thaisnang commented 4 years ago

For some reason, the config file is not getting dumped in the folder. I have tried changing folder permissions but no help.

tanaysoni commented 4 years ago

Hi @thaisnang, could you provide more details on what code/tutorial you're running and the full error stack trace that you get?

thaisnang commented 4 years ago

I ran tutorial 1 with FARMReader. It ran the first time then I tried again but this time with the offline model (same model base RoBERTa). After that it always gives the following:- 05/19/2020 14:16:31 - INFO - elasticsearch - PUT http://localhost:9200/document [status:400 request:0.004s] 05/19/2020 14:16:31 - INFO - haystack.indexing.io - Found data stored in data/article_txt_got. Delete this first if you really want to fetch new data. 05/19/2020 14:16:31 - INFO - elasticsearch - POST http://localhost:9200/_count [status:200 request:0.536s]

05/19/2020 14:16:52 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.665s] 05/19/2020 14:16:53 - INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:0.399s] 05/19/2020 14:16:53 - INFO - haystack.indexing.io - Wrote 517 docs to DB 05/19/2020 14:16:53 - INFO - farm.utils - device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: None 05/19/2020 14:17:04 - WARNING - farm.modeling.language_model - Could not automatically detect from language model name what language it is. We guess it's an ENGLISH model ... If not: Init the language model by supplying the 'language' param. Traceback (most recent call last): File "Tutorial1_Basic_QA_Pipeline.py", line 123, in reader = FARMReader(model_name_or_path="roberta-base-squad2", use_gpu=True) File "/home/imsai/.local/lib/python3.6/site-packages/haystack/reader/farm.py", line 86, in init doc_stride=doc_stride, num_processes=num_processes) File "/home/imsai/.local/lib/python3.6/site-packages/farm/infer.py", line 194, in load processor = Processor.load_from_dir(model_name_or_path) File "/home/imsai/.local/lib/python3.6/site-packages/farm/data_handler/processor.py", line 182, in load_from_dir config = json.load(open(processor_config_file)) FileNotFoundError: [Errno 2] No such file or directory: 'roberta-base-squad2/processor_config.json'


I tried with transformer as well it gives the following--

05/19/2020 14:22:30 - INFO - elasticsearch - PUT http://localhost:9200/document [status:400 request:0.036s] 05/19/2020 14:22:30 - INFO - haystack.indexing.io - Found data stored in data/article_txt_got. Delete this first if you really want to fetch new data. 05/19/2020 14:22:30 - INFO - elasticsearch - POST http://localhost:9200/_count [status:200 request:0.004s] 05/19/2020 14:22:30 - INFO - haystack.indexing.io - Skip writing documents since DB already contains 517 docs ... (Disable only_empty_db, if you want to add docs anyway.) 05/19/2020 14:22:38 - INFO - elasticsearch - POST http://localhost:9200/document/_search [status:200 request:0.318s] 05/19/2020 14:22:38 - INFO - haystack.retriever.elasticsearch - Got 10 candidates from retriever 05/19/2020 14:22:38 - INFO - haystack.finder - Reader is looking for detailed answer in 362347 chars ... convert squad examples to features: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7.14it/s] add example index and unique id: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5629.94it/s] Traceback (most recent call last): File "Tutorial1_Basic_QA_Pipeline.py", line 140, in prediction = finder.get_answers(question="Who is the father of Arya Stark?", top_k_retriever=10, top_k_reader=5) File "/home/imsai/.local/lib/python3.6/site-packages/haystack/finder.py", line 45, in get_answers top_k=top_k_reader) File "/home/imsai/.local/lib/python3.6/site-packages/haystack/reader/transformers.py", line 77, in predict predictions = self.model(query, topk=self.n_best_per_passage) File "/home/imsai/.local/lib/python3.6/site-packages/transformers/pipelines.py", line 1042, in call for s, e, score in zip(starts, ends, scores) File "/home/imsai/.local/lib/python3.6/site-packages/transformers/pipelines.py", line 1042, in for s, e, score in zip(starts, ends, scores) KeyError: 0

thaisnang commented 4 years ago

Note:- I did not download the RoBERTa separately I just renamed the files from the cache it automatically downloaded. I have renamed them properly that I am sure of. Hopefully, this is not affecting it.

tanaysoni commented 4 years ago

Hi @thaisnang, by default, the models are cached and are not re-downloaded on every execution. If that doesn't fit your workflow, I am curious to know more on how you plan to use the save(offline) functionality.

Here's how you can save the model of a FARMReader:

reader.inferencer.save("path-to-save")

and load it again by supplying the path:

reader = FARMReader(model_name_or_path="path-to-save")

thaisnang commented 4 years ago

Actually I saw the model was downloading again when I ran it the second time. So I thought instead of downloading every execution why don't I just copy the cached model and properly rename it and use it as an offline model. And that's what I did, it should not interfere with the function right?

thaisnang commented 4 years ago

OK, I downloaded again and this time the model did not redownload it was using the cached model. And the model was saved as well. Thanks.