Where is the LC-QuAD dataset?

LiberAI / NSpM

🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.

http://aksw.org/Projects/NeuralSPARQLMachines

MIT License

222 stars 86 forks source link

Where is the LC-QuAD dataset? #29

Closed DonnieZhang586 closed 4 years ago

DonnieZhang586 commented 4 years ago

The LC-QuAD data set has 5000 pairs, but I generated it through the lc-quad cvs file in the data path, and the result exceeded hundreds of thousands of LC-QuAD sentence pairs.Please can you help me generate accurate LC-QuAD data set

panchbhai1969 commented 4 years ago

Hi @DonnieZhang586 ,

Thank you for raising the issue.

I need more information to understand the issue: shed more light on what but I generated it through the lc-quad cvs file in the data path means, what is data path here and what is lc-quad cvs.

A script to recreate the issue faced by you will be very helpful in this regard.

As far as Where is the LC-QuAD dataset? is concerned , you may find relevant information here: https://github.com/AskNowQA/LC-QuAD.

Cheers, Anand Panchbhai

DonnieZhang586 commented 4 years ago

Sorry, I did n't describe the problem clearly, I want to know LC-QUAD How are the 5000 en-sparql statement pairs of the data set generated? At present, I only see json files. I tried to extract the en- sparql statement pairs by my own method, but reproduced the en-sparql through machine translation technology. The result of the conversion differs by 30 bleu values, so I want to know how you get 5000 en-sparql sentence pairs? Can you describe your generation process in detail?

best wish

Donnie Zhang

edgardmarx commented 4 years ago

Hello Donnie,

I guess there are some misunderstandings here. The LC-QUAD is a benchmark dataset to QA, as well as QALD. We have done a work creating a large dataset to support Neural Question Answering over DBpedia called DBNQA which can be found here https://github.com/AKSW/DBNQA. This dataset contains QA templates extracted from both QALD and LCQUAD. You can read more about it here: https://www.researchgate.net/publication/324482598_Generating_a_Large_Dataset_for_Neural_Question_Answering_over_the_DBpedia_Knowledge_Base. DBNQA is well known to achieve better F-measure than LCQUAD alone, in fact, according to Yin et al (https://arxiv.org/abs/1906.09302) it can deliver up to 50% better F-measure.