beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

Quora titles are empty on Huggingface #121

Closed jxmorris12 closed 1 year ago

jxmorris12 commented 1 year ago

When I load the BeIR/quora dataset through huggingface, it looks like all of the queries are empty:

>>> import datasets
>>> d = datasets.load_dataset('BeIR/quora', 'corpus')
>>> d['corpus'][0]
{'_id': '1', 'title': '', 'text': 'What is the step by step guide to invest in share market in india?'}
jxmorris12 commented 1 year ago

Update: I found some queries here, but they don't seem to match the corpus:

datasets.load_dataset('BeIR/quora', 'queries')

jxmorris12 commented 1 year ago

Ah I figured it out.