Duplicates in movies dataset

LiberAI / NSpM

🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.

MIT License

223 stars 87 forks source link

Hello!

Firstly thank you very much for your repository and research. This is a very interesting field. I am currently using your monument dataset as the training data in my master thesis.

I notice you uploaded a new dataset called movies_300.zip several days ago. I intended to try it in my experiments as well but I found that it has many duplicate lines in the training file (e.g. "how long is the longest movie" showed 227 times in 'train.en'). Could you explain what is the reason for that? Is it appropriate to use this dataset for training or this dataset is just made for other tasks?

Thank you and best regards Xiaoyu

LiberAI / NSpM

Duplicates in movies dataset #9