emorynlp / nlp4j-old

NLP tools developed by Emory University.
Other
60 stars 19 forks source link

Dep parser appears to produce multiple nodes with the head of the tree as the root. #25

Closed benson-basis closed 8 years ago

benson-basis commented 8 years ago

I need to represent the output of the dependency parser as a collection of dependency tuples: [dprel, governor, dependency].

So, I wrote:

private Dependency nodeToDep(NLPNode n) {
        int gov = n.getDependencyHead().getID();
        return new Dependency.Builder(n.getDependencyLabel(), gov - 1, n.getID() - 1).build();
    }

This is working fine for many examples. However, for the this sentence:

and the modern, lightweight, steel, collapsible wheelchair was created by Harry Jennings and his disabled friend Herbert Everest, in 1933.

I end up with two nodes with a head of 0. My representation is below; see the two occurrences of 'ROOT'. Am I misinterpreting the output data structure?

cc(created-12, and-1)
det(modern-3, the-2)
conj(ROOT-0, modern-3)
punct(modern-3, ,-4)
conj(modern-3, lightweight-5)
punct(lightweight-5, ,-6)
conj(lightweight-5, steel-7)
punct(created-12, ,-8)
nmod(wheelchair-10, collapsible-9)
nsubjpass(created-12, wheelchair-10)
auxpass(created-12, was-11)
root(ROOT-0, created-12)
agent(created-12, by-13)
compound(Jennings-15, Harry-14)
pobj(by-13, Jennings-15)
cc(Jennings-15, and-16)
poss(friend-19, his-17)
nmod(friend-19, disabled-18)
conj(Jennings-15, friend-19)
compound(Everest-21, Herbert-20)
appos(friend-19, Everest-21)
punct(created-12, ,-22)
prep(created-12, in-23)
pobj(in-23, 1933-24)
punct(created-12, .-25)
benson-basis commented 8 years ago

I found my mistake: I split the decode process into running the tagger and then later running the parser, and I messed up the book-keeping. Sorry for the noise.