emorynlp / nlp4j-old

NLP tools developed by Emory University.
Other
60 stars 19 forks source link

Two root nodes from one sentence to the DP, any advice? #32

Open benson-basis opened 8 years ago

benson-basis commented 8 years ago

We trained a UD model with the UD treebank plus the WSJ converted to UD with the Stanford converter. Every so often, a sentence we run comes out with a seemingly impossible structure with an 'extra' root node. The cases we've seen have always involved the 'conj' label.

Does this suggest anything to you? I could share the data and/or the model file if you are interested.

In Anglo-American common law courts, appellate review of lower court decisions may also be obtained by filing a petition for review by prerogative writ in certain cases.

case(courts-5, In-1) amod(courts-5, Anglo-American-2) amod(courts-5, common-3) compound(courts-5, law-4) conj(ROOT-0, courts-5) punct(courts-5, ,-6) amod(review-8, appellate-7) conj(courts-5, review-8) case(decisions-12, of-9) amod(decisions-12, lower-10) compound(decisions-12, court-11) nmod(review-8, decisions-12) aux(obtained-16, may-13) advmod(obtained-16, also-14) auxpass(obtained-16, be-15) root(ROOT-0, obtained-16) mark(filing-18, by-17) advcl(obtained-16, filing-18) det(petition-20, a-19) dobj(filing-18, petition-20) case(review-22, for-21) nmod(filing-18, review-22) case(writ-25, by-23) compound(writ-25, prerogative-24) nmod(filing-18, writ-25) case(cases-28, in-26) amod(cases-28, certain-27) nmod(filing-18, cases-28) punct(obtained-16, .-29)

benson-basis commented 8 years ago

I've been able to demonstrate this using your stock English model.

As well as for complex voice emotional recognition for emotions not included in Mind Reading .

1       As      as      RB      _       3       advmod  _       @#r$%
2       well    well    RB      _       3       advmod  _       @#r$%
3       as      as      IN      _       5       advmod  _       @#r$%
4       for     for     IN      _       0       conj    _       @#r$%
5       complex complex JJ      _       8       nmod    _       @#r$%
6       voice   voice   NN      _       8       nmod    _       @#r$%
7       emotional       emotional       JJ      _       8       nmod    _       @#r$%
8       recognition     recognition     NN      _       4       pobj    _       @#r$%
9       for     for     IN      _       8       prep    _       @#r$%
10      emotions        emotion NNS     _       9       pobj    _       @#r$%
11      not     not     RB      _       12      neg     _       @#r$%
12      included        include VBN     pos2=VBD        0       root    _       @#r$%
13      in      in      IN      _       12      prep    _       @#r$%
14      Mind    mind    NN      pos2=NNP        15      compound        _       @#r$%
15      Reading reading NN      pos2=VBG        13      pobj    _       @#r$%
16      .       .       .       _       12      punct   _       @#r$%
benson-basis commented 8 years ago

I tried adding a feature to the model.

Please don't laugh if my attempt to figure out the feature templates was unsuccessful.

My idea was to discourage things like a conj deprel with the root. Of course, this sort of thing can't globally ding sentences for having more than one root. If there's a way to introduce that idea into the feature set I haven't understood it yet.

   <feature f0="i:dependency_label" f1="i_h:part_of_speech_tag"/>

With my universal training (UD treebank + converted PTB) and the UD dev set the accuracy was the same,

UAS 0.88 LAS 0.85 total tokens 25148

but the number of two-headed outputs from the UD dev set dropped from 4% to 3%.

benson-basis commented 8 years ago

We somewhat belatedly read the papers and understand that this is expected, so we thinking about how to cope.

jdchoi77 commented 8 years ago

Sorry for the late reply; I was meeting a grant proposal deadline. Multiple roots may be caused because by headless nodes; when the parser doesn't find any head, then it by default connects it to the root to keep the entire tree connected, but this is something I should retouch. I'm planning to adapt our structure to UD more now, so I can experiment this more myself as well.

benson-basis commented 8 years ago

OK, thanks.

On Thu, Jul 21, 2016 at 2:33 PM, Jinho D. Choi notifications@github.com wrote:

Sorry for the late reply; I was meeting a grant proposal deadline. Multiple roots may be caused because by headless nodes; when the parser doesn't find any head, then it by default connects it to the root to keep the entire tree connected, but this is something I should retouch. I'm planning to adapt our structure to UD more now, so I can experiment this more myself as well.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emorynlp/nlp4j/issues/32#issuecomment-234342973, or mute the thread https://github.com/notifications/unsubscribe-auth/ADM9zzDxNb6uFiviLtmlPbHBJW33kpptks5qX7tigaJpZM4JLr6C .