Closed yhamoudi closed 9 years ago
If you have some times, it would be nice to check that current random questions still work (maybe by adding them to the deep tests) https://github.com/ProjetPP/PPP-WebUI/blob/master/questions.js
34 commits with 1,295 additions and 955 deletions
Even worse than last pull request :cry:
If you have some times, it would be nice to check that current random questions still work (maybe by adding them to the deep tests) https://github.com/ProjetPP/PPP-WebUI/blob/master/questions.js
I do it
I do it
Thank you :-)
Congratulations for the amazing work done! I've weeks of work in front of me in order to make Wikidata module able to answer to the questions that are now well parsed.
If you have some times, it would be nice to check that current random questions still work (maybe by adding them to the deep tests) https://github.com/ProjetPP/PPP-WebUI/blob/master/questions.js
Done (i didn't put questions that are very similars).
It looks like there is an issue with the reverse predicate. We usually states that (Le Petit Prince, author, Saint-Exupéry) and it's what is done in other tests like "Who wrote \"Le Petit Prince\" and \"Vol de Nuit\"".
Indeed, author
is not relevant here. I've added author by
to the nounification map. We obtain:
'Which books were authored by Victor Hugo?':
I([
T(M(), R('instance of'), R('book')),
T(M(), Li([R('authored by'), R('author')]), R('Victor Hugo'))
]),
Keep in mind that the normal forms here https://github.com/ProjetPP/PPP-QuestionParsing-Grammatical/blob/reverse_predicates/tests/data_deep.py are "subsets" of the normal form outputs by the question parsing (ie there are much more (inverse)_predicates in practice, and most of them are not totally relevant).
Sorry for all these comments. Some of your modifs need to be to canceled (for instance, there is no nounification applied on nouns, so you cannot extract the predicate bibliography
from the question author of 1984
). Be careful also not to change the uper/lowercase letters (the parsing depends on them).
Here is some of the nounifications to add in order to correct some of your additions:
I can add them if you want (or use code 0
to add a ->, and code 1
for <-)
Be careful also not to change the uper/lowercase letters (the parsing depends on them).
This should not happen... I think it would be preferable to remove these questions.
Here is some of the nounifications to add in order to correct some of your additions:
I don't think this is the right thing to do. We do not want the preposition in the nounification.
This should not happen... I think it would be preferable to remove these questions.
why this should not happen? It is how the stanford parser works and there is no problem with this.
We do not want the preposition in the nounification.
Why? A verb in english can be made of several words (kill by, look after, look for, bury in, ...)
You shouldn't revert all your commit, just add the missing nounifications and remove the 3 cases where you try to nounify nouns (and let the lowercase letters)
why this should not happen? It is how the stanford parser works and there is no problem with this.
Yes there is a problem. If a question like "where is foo" works, then the user wants "Where is foo?" to work too. This capital letter does not change the grammatical structure of the sentence, it should not change the output of the Stanford parser.
Why? A verb in english can be made of several words (kill by, look after, look for, bury in, ...)
"look after" and "look for" are different: the preposition gives a (very) different meaning to the verb. But "kill by" and "kill" have approximately the same meaning, "bury in" and "bury" too (and this approximation is well enough for what we are doing). This will simplify our work when we will want to modify the nouns of the nounification: modifying 1 list instead of 3 or 4 is cool, especially if you have to repeat 100 times this modification.
For the nounification of verbs with proposition, we can firstly search if the word made of verb+prep exists in the database. If so, take its nouns. Otherwise, take the nouns of the verb.
Yes there is a problem. If a question like "where is foo" works, then the user wants "Where is foo?" to work too.
This is why there are deep tests with lowercase letters. If you remove them, we cannot track the problem...
But "kill by" and "kill" have approximately the same meaning
"kill" and "kill by" have exactly the opposite meaning ("bla killed bli" <-> "bli is killed by bla"). If we do not take prepositions into account (for instance, use a same entry for "kill" and "kill by" into the database), then we should stop to distinguish predicates/inverse_predicates (you will nounify "bla killed bli" into (bla,[killed by,killer,killed,murderer],bli,[killed by,killer,killed,murderer])
and "bli is killed by bla" into (bli,[killed by,killer,killed,murderer],bla,[killed by,killer,killed,murderer])
)
Same thing for "bury" and "bury in". There is a big difference between "bla buried bli" and "bli is burried in ..."
For the nounification of verbs with proposition, we can firstly search if the word made of verb+prep exists in the database. If so, take its nouns. Otherwise, take the nouns of the verb.
This is what is done actually...
"kill" and "kill by" have exactly the opposite meaning ("bla killed bli" <-> "bli is killed by bla").
Yes. But (Kennedy, killed by, Oswald)
and (Kennedy, killer, Oswald)
are correct triples and mean the same thing. I cannot think of a sentence where it will not be the case.
Here you nounify kill by
by its inverse predicate. We cannot say that the predicates associated to a verb+prep are always the inverse predicates associated to the verb, so we cannot remove verb+prep from the database.
We could try to avoid verb+by (it seems to require always the inverse predicate of the verb), but it's another problem.
'From which country is Alan Turing?':
I([
T(M(), R('instance of'), R('country')),
T(R('Alan Turing'), R('country of citizenship'), M())
Where does the "country of citizenship" come from?
Some of the questions you highlighted were taken from the dataset of 5000 questions used by question answerings contests. I don't know if they're wrong, but if they used it the questions are probably asked in practice. We shouldn't change them:
Where does the "country of citizenship" come from?
Eh eh, it's magic (in fact, we use From which
to replace is
)
Some of the questions you highlighted were taken from the dataset of 5000 questions used by question answerings contests. I don't know if they're wrong, but if they used it the questions are probably asked in practice. We shouldn't change them:
Yes, this is why we should let them in the tests. But I put the comments to keep in mind that these questions are grammaticaly incorrect, and that we fail to parse them if they are corrected (the "did" mess everything).
But I put the comments to keep in mind that these questions are grammaticaly incorrec
Are you sure that they're wrong? For instance, there is no result for such query and i find some of your sentences really strange (Who did invent the hula hoop?, What king did rule on France?...). Moreover, the stanford parser shouldn't fail on all these questions if they were grammatically correct.
I think this rule applies to:
Many important modifications into this pull request. I won't detail everything, but I can explain what you want.
Improvements:
Word
andDependenciesTree
are in a same filedependencyTree.py
. I think it's more readable now. See demo files to understand what functions need to be called now.DependenciesTree
. The lemmatization (that produces alternatives predicates) is performed at the end, intonormalize.py
.