facebookresearch / KILT

Library for Knowledge Intensive Language Tasks
MIT License
894 stars 90 forks source link

Missing attribution data in KILT NQ training #66

Open rhofour opened 1 year ago

rhofour commented 1 year ago

My understanding is a lot of the value from KILT comes from the gold attribution data. In the KILT data format wikipedia_id is listed as mandatory, but when I started working with nq-train-kilt.jsonl I quickly found that while ~77k examples have attribution ~10k examples don't.

Is this expected or a bug?

I checked nq-dev-kilt.jsonl and found every example there has attribution data.