Tokeniser on displaCy not consistent.

explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

https://spacy.io

MIT License

29.87k stars 4.38k forks source link

Tokeniser on displaCy not consistent. #480

Closed davidkell closed 8 years ago

davidkell commented 8 years ago

Hi there,

Using the (wonderful!) displaCy visualiser for prototyping some ideas. However I keep seeing many inconsistencies between POS tagger in displaCy and the POS tagger in the spacy python package- displaCy usually correct. Is this expected behaviour?

Thanks in advance!

Dave

honnibal commented 8 years ago

Hey,

Could you give some examples? At a guess you might be referring to some extra token merging that displaCy does to post-process the spaCy output. The function can be seen here: https://github.com/spacy-io/displacy-server/blob/master/displacy/handlers.py#L31

aoldoni commented 8 years ago

HI @honnibal ,

Thanks for spaCy. I believe I have an example of different behaviour between the dependency tree from Displacy and Spacy.

If one tries the sentence: "A two-tier scheme (Pang and Lee, 2004) where sentences are first classified as subjective versus objective , and then applying the sentiment classifier on only the subjective sentences further improves performance."

The difference then is:

Displacy online: "improves" -> nsubj -> "scheme".
Spacy python package: "improves" -> csubj -> "applying".

The Displacy version being the correct one.

Thanks!

honnibal commented 8 years ago

I'm going to tentatively close this — hopefully whatever the issue was, it's resolved in 1.0. Please reopen if things are still weird.

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.