Reverse predicates - Githubissues

yhamoudi commented 9 years ago

Many important modifications into this pull request. I won't detail everything, but I can explain what you want.

Improvements:

the structure of the files has been changed. For instance, the classes Word and DependenciesTree are in a same file dependencyTree.py. I think it's more readable now. See demo files to understand what functions need to be called now.
we no longer manipulate lists of lists of words into a DependenciesTree. The lemmatization (that produces alternatives predicates) is performed at the end, into normalize.py.
more than 100 deep tests. Have a look at them to see what kind of questions we can handle.
inverse predicates \o/
the nounification databases have been updated to store both predicates and inverse predicates. See nounDB to understand how it works (code 0 for predicates, 1 for inverse predicates)
the dependency analysis has been simplified (only 4 rules: R0, ..., R3 + Rspl + Rconj + Rexist)

Tpt commented 9 years ago

If you have some times, it would be nice to check that current random questions still work (maybe by adding them to the deep tests) https://github.com/ProjetPP/PPP-WebUI/blob/master/questions.js

Ezibenroc commented 9 years ago

34 commits with 1,295 additions and 955 deletions

Even worse than last pull request :cry:

yhamoudi commented 9 years ago

If you have some times, it would be nice to check that current random questions still work (maybe by adding them to the deep tests) https://github.com/ProjetPP/PPP-WebUI/blob/master/questions.js

I do it

Tpt commented 9 years ago

I do it

Thank you :-)

Congratulations for the amazing work done! I've weeks of work in front of me in order to make Wikidata module able to answer to the questions that are now well parsed.

yhamoudi commented 9 years ago

If you have some times, it would be nice to check that current random questions still work (maybe by adding them to the deep tests) https://github.com/ProjetPP/PPP-WebUI/blob/master/questions.js

Done (i didn't put questions that are very similars).

It looks like there is an issue with the reverse predicate. We usually states that (Le Petit Prince, author, Saint-Exupéry) and it's what is done in other tests like "Who wrote \"Le Petit Prince\" and \"Vol de Nuit\"".

Indeed, author is not relevant here. I've added author by to the nounification map. We obtain:

    'Which books were authored by Victor Hugo?':
    I([
        T(M(), R('instance of'), R('book')),
        T(M(), Li([R('authored by'), R('author')]), R('Victor Hugo'))
    ]),

yhamoudi commented 9 years ago

Keep in mind that the normal forms here https://github.com/ProjetPP/PPP-QuestionParsing-Grammatical/blob/reverse_predicates/tests/data_deep.py are "subsets" of the normal form outputs by the question parsing (ie there are much more (inverse)_predicates in practice, and most of them are not totally relevant).

yhamoudi commented 9 years ago

Sorry for all these comments. Some of your modifs need to be to canceled (for instance, there is no nounification applied on nouns, so you cannot extract the predicate bibliography from the question author of 1984). Be careful also not to change the uper/lowercase letters (the parsing depends on them).

Here is some of the nounifications to add in order to correct some of your additions:

kill by: -> ['killer', 'murderer']
speak in: <- ['language']
found by: -> ['founder','creator','promoter'] (and not "foundator')
bury in: -> ['location of burial', 'burial place']
bear on: <- ['birth']
rule on: <- ['ruler']

I can add them if you want (or use code 0 to add a ->, and code 1 for <-)

Ezibenroc commented 9 years ago

Be careful also not to change the uper/lowercase letters (the parsing depends on them).

This should not happen... I think it would be preferable to remove these questions.

Here is some of the nounifications to add in order to correct some of your additions:

I don't think this is the right thing to do. We do not want the preposition in the nounification.

yhamoudi commented 9 years ago

This should not happen... I think it would be preferable to remove these questions.

why this should not happen? It is how the stanford parser works and there is no problem with this.

We do not want the preposition in the nounification.

Why? A verb in english can be made of several words (kill by, look after, look for, bury in, ...)

You shouldn't revert all your commit, just add the missing nounifications and remove the 3 cases where you try to nounify nouns (and let the lowercase letters)

Ezibenroc commented 9 years ago

why this should not happen? It is how the stanford parser works and there is no problem with this.

Yes there is a problem. If a question like "where is foo" works, then the user wants "Where is foo?" to work too. This capital letter does not change the grammatical structure of the sentence, it should not change the output of the Stanford parser.

Why? A verb in english can be made of several words (kill by, look after, look for, bury in, ...)

"look after" and "look for" are different: the preposition gives a (very) different meaning to the verb. But "kill by" and "kill" have approximately the same meaning, "bury in" and "bury" too (and this approximation is well enough for what we are doing). This will simplify our work when we will want to modify the nouns of the nounification: modifying 1 list instead of 3 or 4 is cool, especially if you have to repeat 100 times this modification.

For the nounification of verbs with proposition, we can firstly search if the word made of verb+prep exists in the database. If so, take its nouns. Otherwise, take the nouns of the verb.

yhamoudi commented 9 years ago

Yes there is a problem. If a question like "where is foo" works, then the user wants "Where is foo?" to work too.

This is why there are deep tests with lowercase letters. If you remove them, we cannot track the problem...

But "kill by" and "kill" have approximately the same meaning

"kill" and "kill by" have exactly the opposite meaning ("bla killed bli" <-> "bli is killed by bla"). If we do not take prepositions into account (for instance, use a same entry for "kill" and "kill by" into the database), then we should stop to distinguish predicates/inverse_predicates (you will nounify "bla killed bli" into (bla,[killed by,killer,killed,murderer],bli,[killed by,killer,killed,murderer]) and "bli is killed by bla" into (bli,[killed by,killer,killed,murderer],bla,[killed by,killer,killed,murderer]))

Same thing for "bury" and "bury in". There is a big difference between "bla buried bli" and "bli is burried in ..."

For the nounification of verbs with proposition, we can firstly search if the word made of verb+prep exists in the database. If so, take its nouns. Otherwise, take the nouns of the verb.

This is what is done actually...

Ezibenroc commented 9 years ago

"kill" and "kill by" have exactly the opposite meaning ("bla killed bli" <-> "bli is killed by bla").

Yes. But (Kennedy, killed by, Oswald) and (Kennedy, killer, Oswald) are correct triples and mean the same thing. I cannot think of a sentence where it will not be the case.

yhamoudi commented 9 years ago

Here you nounify kill by by its inverse predicate. We cannot say that the predicates associated to a verb+prep are always the inverse predicates associated to the verb, so we cannot remove verb+prep from the database.

We could try to avoid verb+by (it seems to require always the inverse predicate of the verb), but it's another problem.

Ezibenroc commented 9 years ago

    'From which country is Alan Turing?':
    I([
        T(M(), R('instance of'), R('country')),
        T(R('Alan Turing'), R('country of citizenship'), M())

Where does the "country of citizenship" come from?

yhamoudi commented 9 years ago

Some of the questions you highlighted were taken from the dataset of 5000 questions used by question answerings contests. I don't know if they're wrong, but if they used it the questions are probably asked in practice. We shouldn't change them:

What is the world's population?
Which country colonized Hong Kong? (and so "What kings ruled on France?")
What actor married John F. Kennedy\'s sister?
Who invented the hula hoop? (and so "Who developed Microsoft?")
What is the Boston Strangler's name?

Where does the "country of citizenship" come from?

Eh eh, it's magic (in fact, we use From which to replace is)

Ezibenroc commented 9 years ago

Some of the questions you highlighted were taken from the dataset of 5000 questions used by question answerings contests. I don't know if they're wrong, but if they used it the questions are probably asked in practice. We shouldn't change them:

Yes, this is why we should let them in the tests. But I put the comments to keep in mind that these questions are grammaticaly incorrect, and that we fail to parse them if they are corrected (the "did" mess everything).

yhamoudi commented 9 years ago

But I put the comments to keep in mind that these questions are grammaticaly incorrec

Are you sure that they're wrong? For instance, there is no result for such query and i find some of your sentences really strange (Who did invent the hula hoop?, What king did rule on France?...). Moreover, the stanford parser shouldn't fail on all these questions if they were grammatically correct.

yhamoudi commented 9 years ago

https://www.englishforums.com/English/DidInventOrInvented/phzzm/post.htm

yhamoudi commented 9 years ago

I think this rule applies to:

Which country colonized Hong Kong?
What kings ruled on France?
What actor married John F. Kennedy\'s sister?
Who invented the hula hoop?
Who developed Microsoft?

ProjetPP / PPP-QuestionParsing-Grammatical

Reverse predicates #128