Adds a feature to check for and correct incorrect sentence splits caused by spacy/scispacy, when those errors are indicated by a relation or an entity being split across multiple sentences in the original tokenization. Most frequently, I have noticed this in scientific text around spltis on periods that are actually part of abbreviations; for example,. "Pseudomonas syringae pv. tabaci" splits on the period into two different sentences.
This correction prevents downstream errors when running DyGIE++ models, as if left uncorrected, documents with incorrect sentence splits will throw an exception because they look like cross-sentence relations/entities.
Adds a feature to check for and correct incorrect sentence splits caused by spacy/scispacy, when those errors are indicated by a relation or an entity being split across multiple sentences in the original tokenization. Most frequently, I have noticed this in scientific text around spltis on periods that are actually part of abbreviations; for example,. "Pseudomonas syringae pv. tabaci" splits on the period into two different sentences.
This correction prevents downstream errors when running DyGIE++ models, as if left uncorrected, documents with incorrect sentence splits will throw an exception because they look like cross-sentence relations/entities.