Clarification and Guidance Request on CWQ Dataset Preprocessing

IDEA-FinAI / ToG

This is the official github repo of Think-on-Graph. If you are interested in our work or willing to join our research team in Shenzhen, please feel free to contact us by email (xuchengjin@idea.edu.cn)

348 stars 39 forks source link

Hi,

1.If I check the cwq dataset file correctly, there is actually 3,531 samples in the dataset file.

2.Here is the pipeline of preprocessing the dataset: First, prompt the LLM to extract the entity. Second, use Wikidata API we defined in the Wikidata' to convert name into Qid (label2qid). Third, useWikidata APIwe defined in theWikidata' to convert qid into Mid (qid2mid).

3.Because some samples of the cwq testset may be from webqsp. However, this is the construction of the dataset, nothing to do with our algorithm, please refer their paper for more details.

IDEA-FinAI / ToG

Clarification and Guidance Request on CWQ Dataset Preprocessing #17