Open gouqi666 opened 2 years ago
Thank you for the question! If you download the data manually and save it locally, then you can specify DATA_DIR
here. You should be fine. There is also sample code for loading the dataset in here.
We updated the test split of MultiDoc2Dial dataset recently, which caused some issue using Huggingface load_dataset
. @sivasankalpp Could you please follow up on this? Thank you very much!
Hi @gouqi666, happy to help! Can you share the command you tried to download the dataset?
Hi @gouqi666, happy to help! Can you share the command you tried to download the dataset?
Hi @sivasankalpp , I meant that the huggingface multdoc2dial. I think there's need to run their cmd to update dataset_infos.json
and then test if load_dataset
works with the latest multidoc2dial download. This should resolve the data loading issue reported here.
And in model_convert.py --> retriever = RagRetriever(model.config, question_encoder_tokenizer, generator_tokenizer) shows ConnectionError: Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.16.1/datasets/wiki_dpr/wiki_dpr.py.
The link works for me.
Hi, when I try to run your code, I found I can't download datasets by using load_dataset . The error is "HF google storage unreachable. Downloading and preparing it from source" . Although I have used vpn, the problem is stil here. So I want to download data manually, But I found the data is mismatched in some field. Could u help me? thanks.