ArvinZhuang / DSI-QG

The official repository for "Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation", Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon and Daxin Jiang.
MIT License
110 stars 16 forks source link

cannt use huggingface datasets and model in online way #11

Open bencaocs opened 11 months ago

bencaocs commented 11 months ago

If i cannt use huggingface dataset and model online, Does i have other way to use this code? I try to down dataset(Tevatron/msmarco-passage-corpus) to disk, and use process_marco.py to process, its OK.

But when i Run run.py, it give me a feback, Traceback (most recent call last): File "/home/bio-3090ti/anaconda3/envs/DSI-transform/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1724, in from_pretrained resolved_vocab_files[file_id] = cached_path( File "/home/bio-3090ti/anaconda3/envs/DSI-transform/lib/python3.8/site-packages/transformers/file_utils.py", line 1921, in cached_path output_path = get_from_cache( File "/home/bio-3090ti/anaconda3/envs/DSI-transform/lib/python3.8/site-packages/transformers/file_utils.py", line 2177, in get_from_cache raise ValueError( ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Thanks.

ArvinZhuang commented 11 months ago

Hi, can you also try download huggingface t5-base or large models here https://huggingface.co/t5-base to disk and load the model there?

bencaocs commented 11 months ago

Hi, can you also try download huggingface t5-base or large models here https://huggingface.co/t5-base to disk and load the model there?

Thanks for u fast replay. i think maybe its a good way. But i am not sure File structure. Now, my File structure is

DSI-QG
- -__pycache__
-  cache
-     dowloads
-     Tevatron__msmarco-passage-corpus
-         default
- CE
- data
-   msmarco_data
-     100k
-     X.tsv
- Other file .py .sh et.al**

If the directory is correct, where should I store t5-base after I download it? Is that the same cache

Thank your very much.

ArvinZhuang commented 11 months ago

simply set --model_name to the dir where you save the downloaded model in the running command