Closed buhrmann closed 5 years ago
Thanks for the report! The error seems to occur here when spaCy is trying to count the words to the root and fails.
I tried it locally, but I haven't been able to reproduce the problem 🤔 I ran your exact code, with both spaCy v2.0.18 and spaCy v2.1.3. I also tried the new doc.retokenize
contect manager for comparison, and tried merging various combinations of tokens including the emoji.
Hmm, ok, I'll try to check if it's to do with the version then.
Do you have anything else set up in your pipeline by any chance? Like, spacymoji
etc.?
Hm, no, it failed with the exact code sample above, though now suddenly it seems to work! I'm not sure if the version of any related package has been updated in my environment to be honest, though I'm pretty sure neither spacy nor spacy_stanfordnlp have changed. My best guess is that perhaps the stanford model itself has been updated on the servers, but really I don't know... In any case, I think this can be closed as not reproducible.
Hi, it seems that in some cases of using StanfordNLP models the result is an invalid parse tree state. When trying to merge certain spans, I get
RuntimeError: [E039] Array bounds exceeded while searching for root word. This likely means the parse tree is in an invalid state
.Here is a reproducible example (at least for my installation), failing when trying to merge an emoji:
This doesn't seem to happen with a regular Spacy language (the tokenization is slightly different, but merging spans including the same emoji works here):