ProjetPP / PPP-QuestionParsing-Grammatical

Question Parsing module for the PPP using a grammatical approch
GNU Affero General Public License v3.0
33 stars 11 forks source link

"What is the continent of Argentina, Brazil and Chile?" #103

Open Ezibenroc opened 9 years ago

Ezibenroc commented 9 years ago

Produces:

{
    "list": [
        {
            "subject": {
                "value": "Argentina", 
                "type": "resource"
            }, 
            "object": {
                "type": "missing"
            }, 
            "type": "triple", 
            "predicate": {
                "value": "continent", 
                "type": "resource"
            }
        }, 
        {
            "subject": {
                "value": "Brazil Chile", 
                "type": "resource"
            }, 
            "object": {
                "type": "missing"
            }, 
            "type": "triple", 
            "predicate": {
                "value": "continent", 
                "type": "resource"
            }
        }
    ], 
    "type": "intersection"
}

Reason: we merge nodes that we should not merge in the preprocessing. tmp

Tree given by the Stanford parser (before preprocessing): tmp

It looks great. It still works if you add elements to the conjunction (try to add Bolivia and Peru for instance).

Solution: do not merge nodes with conj_* dependency during the NER merging.

yhamoudi commented 9 years ago

I removed the merging for conjonctions between a son and its father for the same reason (here, nobody is merged with Argentina).

However, I don't know if it's relevant here. We have to do the choice that enables us to cover the most possible questions (or find a way to make a distinction between useful and useless merging...)

Ezibenroc commented 9 years ago

Now we don't merge these nodes in the preprocessing: https://github.com/ProjetPP/PPP-QuestionParsing-Grammatical/commit/ba36c8467d48ece99f2ae313ff536971782b2c99

We still produce a wrong tree: tmp

The tree after the preprocessing is as follow: tmp

Ezibenroc commented 9 years ago

I think the problem come from the handling of the conj dependency: we only considered conjunctions of two elements...

Ezibenroc commented 9 years ago

@yhamoudi what do you think of this adaptation of the algorithm to handle conj dependencies? conj_or-crop

yhamoudi commented 9 years ago

it seems to be the natural way to extend what we do actually, so why not (perhaps try it on some examples before, it gives what we want on What is the continent of Argentina, Brazil and Chile? ?)

Actually the re-balanced operation for conjunction is performed by conjConnectorsUp (the most ugly function ever...). We can try to adapt the code (not so easy), or rewrite it more properly (if you have an idea...)

Ezibenroc commented 9 years ago

it gives what we want on What is the continent of Argentina, Brazil and Chile? ?

Yes. It gives a subtree with a node and and 3 children: Argentina, Brazil and Chile. We would then have to adapt the function Normalize to handle an arbitrary number of children for the nodes and and or, but this is very natural.

Actually the re-balanced operation for conjunction is performed by conjConnectorsUp (the most ugly function ever...). We can try to adapt the code (not so easy), or rewrite it more properly (if you have an idea...)

Yes it would be better to rewrite it. I will think about that.

yhamoudi commented 9 years ago

I realize that the article isn't true on the transformation. Here is what we really do in case of 2 elements (the generalisation is straightforward): image