explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.2k stars 4.4k forks source link

Dependency Parser get incorrect result #3145

Closed sina-b closed 5 years ago

sina-b commented 5 years ago

How to reproduce the behaviour

I just tested the Dependency Parser with the language file en_core_web_lg v2.0.0 in my code and it got me an incorrect parse. My input sentence is "John works in Germany". The resulting parse is:

0 John nsubj PROPN 1 [] 1 works ROOT VERB 1 [John, in, .] 2 in prep ADP 1 [Germany] 3 Germany pobj PROPN 2 [] 4 . punct PUNCT 1 []

My code looks like the following:

import spacy

nlp_en = spacy.load('C:/Dev/Programs/spaCyModels/en_core_web_md-2.0.0/en_core_web_md/en_core_web_md-2.0.0')
doc_en = nlp_en('John works in Germany.')

index = 0
tokens = []
for token in doc_en:
    tokens.append(str(token.text))

for token in doc_en:
    index = index
    word = token.text
    dep = token.dep_
    head = token.head
    pos = token.pos_
    head_index = [i for i, e in enumerate(tokens) if e == str(token.head.text)][0]

    print(index, token.text, token.dep_, token.pos_, head_index,
          [child for child in token.children])

    index = index + 1

In the correct parse, "Germany" would directly depend on the root, while the preposition "in" depends on "Germany".

The double check, I used your online parser and it gives me the same result: https://explosion.ai/demos/displacy?text=John%20works%20in%20Germany.&model=en_core_web_lg&cpu=1&cph=1

Thanks for a quick reply! :)

Your Environment

honnibal commented 5 years ago

In the correct parse, "Germany" would directly depend on the root, while the preposition "in" depends on "Germany".

That's not the annotation scheme the parser has been trained with. While some annotation schemes strongly prefer lexical heads, it generally helps parser accuracy to have prepositions heading prepositional phrases, and this is the standard analysis for English.

So, the parse is not wrong.

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.