Open wendywangwwt opened 6 months ago
Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently.
I downloaded the dataset using the following code snippet:
from datasets import load_dataset dataset = load_dataset("THUDM/LongBench",'multifieldqa_en')
While examining the content, I noticed that out of 150 entries, 2 are in Chinese rather than English: .
Can you please take a look? Thank you!
Hi! They are classified as English samples as they contain more English characters (a-zA-Z) than Chinese characters.
Hi! I'm working on a long document QA problem and looked into the MultiFieldQA-en dataset recently.
I downloaded the dataset using the following code snippet:
While examining the content, I noticed that out of 150 entries, 2 are in Chinese rather than English: .
Can you please take a look? Thank you!