OpenBMB / InfiniteBench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
MIT License
244 stars 19 forks source link

Error in loading from Huggingface #19

Open BenHamm opened 1 month ago

BenHamm commented 1 month ago

When I try to run the following code in colab:

from datasets import load_dataset dataset = load_dataset("xinrongzhang2022/InfiniteBench")

I get the following error:

DatasetGenerationCastError: An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 1 missing columns ({'options'})

This happened while the json dataset builder was generating data using

hf://datasets/xinrongzhang2022/InfiniteBench/kv_retrieval.jsonl (at revision 2c3c9fe62808833ab783026bbf8e7a47539a28c6)

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)

tuantuanzhang commented 1 month ago

We have re-uploaded the files and solved this problem. Please kindly use load_dataset("xinrongzhang2022/InfiniteBench") now!

HivaMohammadzadeh1 commented 4 weeks ago

Running into this issue as well

tuantuanzhang commented 3 weeks ago

please kindly set features when loading datasets

import datasets
from dataset import Value, Sequence
ft = Features({"id": Value("int64"), "context": Value("string"), "input": Value("string"), "answer": Sequence(Value("string")), "options": Sequence(Value("string"))})
dataset = load_dataset("xinrongzhang2022/InfiniteBench", features=ft)