THUDM / LongBench

[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
MIT License
679 stars 55 forks source link

Load dataset from hf failed #68

Open murphypei opened 4 months ago

murphypei commented 4 months ago
datasets = ['hotpotqa', '2wikimqa', 'musique', 'narrativeqa', 'qasper', 'multifieldqa_en', 'gov_report', 'qmsum', 'trec', 'samsum', 'triviaqa', 'passage_count', 'passage_retrieval_en', 'multi_news']
for dataset in datasets:
        print(f"Loading dataset {dataset}")
        data = load_dataset("THUDM/LongBench", dataset, split="test")
        output_path = f"{output_dir}/pred/{dataset}.jsonl"

File "/usr/local/lib/python3.9/dist-packages/datasets/packaged_modules/cache/cache.py", line 65, in _find_hash_in_cache raise ValueError( ValueError: Couldn't find cache for THUDM/LongBench for config '2wikimqa' Available configs in the cache: ['dureader', 'hotpotqa', 'multifieldqa_en_e', 'qasper_e']

bys0318 commented 4 months ago

Hi, can you try deleting the cached files and download all over again?

murphypei commented 4 months ago

Hi, can you try deleting the cached files and download all over again?

yes, and I test many times in both local machine and docker environment. I don't known if you can reproduce this error, maybe this error is just my mistakes. Thanks for your reply.

Finally I was forced to download the jsonl file and load it from local disk and it works.

I can still use this dataset but I think this error may leading to reduced usage.

bys0318 commented 4 months ago

Glad to hear you've loaded the dataset! Perhaps this error is due to a low datasets version. One can try update the package:

pip install -U datasets
murphypei commented 4 months ago

Glad to hear you've loaded the dataset! Perhaps this error is due to a low datasets version. One can try update the package:

pip install -U datasets

I have already upgraded it to the lastest version but it didn't work. Maybe it's the huggingface issue?