Closed ram-g-athreya closed 4 years ago
Hi @geraltofrivia , what do you think?
Thank you, @ram-g-athreya for all the efforts you've put in doing this. However, I discussed this internally and we're unsure about making suggested changes in the dataset. For two major reasons:
What do you think? @ram-g-athreya @RicardoUsbeck
Hi, I think that the syntactical and lexical questions are not that much of a value, since this is normally fixed in a preprocessing step. However, your comment is worthwhile and makes sense, especially w.r.t. to the leaderboard http://lc-quad.sda.tech/ (shameless plug of an GERBIL QA http://gerbil-qa.aksw.org/gerbil/config integration here)
Thus, I suggest to release a 1.0 and a 1.fwe (fixed-writting-errors or something) version here on github and in the leaderboard and indicate that 1.0 is from you original ISWC paper and used in the leaderboard and 1.fwe contains fixes if you want to train your system on clean data.
By the way, for LC-QuAD 2.0 (which I am sure will come with way more questions, templates and challenges 🥇) one should think about the "data quality" also.
Releasing a fixed version for the train split does make sense. The test split can remain the same, but one can train on the fixed train version.
Gerbil sounds like a nice idea for evaluating over LC-QuAD. We were planning to integrate it from quite some time, but never got time to get it done. Maybe we can have a short discussion regarding the same sometime in the future.
The LC-QuAD 2.0 is in the pipeline. We have started the initial experiments but there is still a long way to go :angel:
Personally, I think this is the right way to go. If @ram-g-athreya and @RicardoUsbeck think this to be prudent, I shall decline this PR and instead wait for @ram-g-athreya to send in another with changes to the train data only?
Thanks for your replies.
Looking forward to your thoughts!
Excuse my continuous mumbling: I just remembered that we introduced syntactic and other user-driven mistakes in QALD-8 and the general response from the participants and the audience was that this was senseless. I have somewhere the results from a survey among 20 QALD participants from QALD 1 to QALD 8 which contains that. Not sure if that is helpful or guiding a direction.
Hi, I am wondering what is the status of correction mentioned above. Since more and more papers are using LC-QuAD 1.0 for evaluation, and those intentionally added mistakes make the data preprocessing stage tricky and less transparent, a cleaner version would help the community know what real performance those systems can achieve in terms of understanding the semantics of natural language queries. Thanks!
Hi @gychant, the authors of the paper (dataset) believe that any approach tackling the challenge should be able to handle malformed spellings, and minor noise in the questions. True, this makes the task more challenging, but we believe its a step in the right direction.
Based on the above, we decide that we should let these syntactic mistakes exist in the dataset. In both, train and test instances. The former enables statistical models to train on noisy data, which is generally thought to make them more robust to noise when deployed. Likewise, noisy test instances ensure that the performance on the dataset would be more representative of the performance of these approaches were they to be used by the general public.
In effect thus, we would not be merging the PR into the repo.
what real performance those systems can achieve
As mentioned above, the real performance shouldn't be thought of as an approach's ability to transform perfectly grammatically correct natural language to formal language instances, but instead, an estimation of these approaches when used by real users. We can not expect real users to write lexically and semantically correct questions, and thus, we believe that performance on our dataset as it stands would be a closer estimate to the real performance, than if we were to merge this PR.
Hi
I had been using the LC-QuAD dataset as part of my thesis. While using it I had made some corrections based on Grammar or based on the intermediate question template.
I was hoping the changes could be inducted to the official dataset so that it improves the overall quality of the dataset.
Feel free to reach out regarding any issues or concerns in this regard.
Thanks Ram G Athreya