My understanding is a lot of the value from KILT comes from the gold attribution data. In the KILT data format wikipedia_id is listed as mandatory, but when I started working with nq-train-kilt.jsonl I quickly found that while ~77k examples have attribution ~10k examples don't.
Is this expected or a bug?
I checked nq-dev-kilt.jsonl and found every example there has attribution data.
My understanding is a lot of the value from KILT comes from the gold attribution data. In the KILT data format wikipedia_id is listed as mandatory, but when I started working with nq-train-kilt.jsonl I quickly found that while ~77k examples have attribution ~10k examples don't.
Is this expected or a bug?
I checked nq-dev-kilt.jsonl and found every example there has attribution data.