facebookresearch / KILT

Library for Knowledge Intensive Language Tasks
MIT License
894 stars 90 forks source link

Aligning new datasets into KILT format #24

Closed eisenjulian closed 3 years ago

eisenjulian commented 3 years ago

Congrats on the great work. I was thinking on aligning a new dataset into the KILT format to facilitate adoption, can you comment if:

  1. You would consider additions to the benchmark if it fits the task spirit.
  2. The code that does the transform and alignment (for example for Fever or NQ) will be released, after a quick search I couldn't find it.

Thanks!

fabiopetroni commented 3 years ago

Hey @eisenjulian,

Thanks for your message and kind words.

  1. Definitely! We plan to continue collecting dataset to the pull!

  2. Does your dataset use a different Wikipedia dump? If yes, I can try to release a minimal script for the alignment

Thanks, Fabio

eisenjulian commented 3 years ago

Hi @fabiopetroni, Thanks for the quick response. Regarding 2. indeed it has a different Wikipedia version, so having the standard logic used for the alignment would be super useful, and hopefully other people can use it as well to add more datasets.

Best, Julian

fabiopetroni commented 3 years ago

Hey @eisenjulian .

we just added an example script to map a dataset to KILT format - see https://github.com/facebookresearch/KILT/blob/master/scripts/map_datasets.py

:)