chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

IndexError: [E201] Span index out of range #314

Closed mzeidhassan closed 3 years ago

mzeidhassan commented 3 years ago

Hi @bdewilde,

I updated spaCy to the latest release '2.3.5', but something seems to be wrong or incompatible

Here is the error I am getting:

IndexError: [E201] Span index out of range.
Traceback:
File "py37/env/lib/python3.7/site-packages/streamlit/script_runner.py", line 332, in _run_script
    exec(code, module.__dict__)
File "py37/env/deep_analysis.py", line 314, in <module>
    acronyms = textacy.extract.acronyms_and_definitions(doc)
File "py37/env/lib/python3.7/site-packages/textacy/extract.py", line 599, in acronyms_and_definitions
    window_ = window.text
File "span.pyx", line 503, in spacy.tokens.span.Span.text.__get__
File "span.pyx", line 190, in spacy.tokens.span.Span.__getitem__

Environment

Any idea?

Thanks in advance!

mzeidhassan commented 3 years ago

I also would like to confirm that everything was working just fine with spaCy 2.3.2

svlandeg commented 3 years ago

Hi all,

As @mzeidhassan mentions, this could indeed be related to a bug in spaCy where calling span.text on an empty Span would result in an IndexError.

There's a few ways of dealing with this:

mzeidhassan commented 3 years ago

Hi @bdewilde, I am not sure if "self.text_with_ws" resolves the issue. Using "text_with_ws" extracts only the acronym itself, without its full wording. Am I missing something? Can you please let me know if there is a workaround to get it right? Thanks in advance!

mzeidhassan commented 3 years ago

I am closing this issue, because it is fixed in the latest release of Textacy. Thanks a million for the update.