Closed kyleclo closed 3 years ago
Yeah, Tom ran into this as well. It turns out that pysbd can segment a sentence in the middle of a spacy token, and I didn't handle that case, not even sure what the right thing to do in that case would be. It seemed like there were a fair number of different patterns this could happen with, and issues in pysbd haven't been fixed for quite a while. So at this point I think you should either 1) run pysbd on its own and then scispacy over individual sentences (not sure if there are other pysbd problems that might make this hard) 2) use the built in dep parser based splitter or 3) use the built in rule based splitter (https://spacy.io/api/sentencizer).
This should be fixed now because we've upgraded Pysbd 👍
On the following abstract:
calling nlp(text) results in this error
This is using:
on version 0.2.4.