Closed thoppe closed 7 years ago
Here's a line copied directly from Wikipedia, should be a simple grammar fix
import nlpre
P = nlpre.seperate_reference()
text = '''There are at least eight distinct types of modifications found on histones (see the legend box on the top left of the figure). Enzymes have been identified for acetylation,[2] methylation,[3] demethylation,[4] phosphorylation,[5] ubiquitination,[6] sumoylation,[7] ADP-ribosylation,[8] deimination,[9][10] and proline isomerization.[11]'''
Which gives
There are at least eight distinct types of modifications found on histones (see the legend box on the top left of the figure) . Enzymes have been identified for acetylation,[2] methylation,[3] demethylation,[4] phosphorylation,[5] ubiquitination,[6] sumoylation,[7] ADP-ribosylation,[8] deimination,[9][10] and proline isomerization .
ie. none of the references have been removed.
Longer biomedical texts include references which often are concatenated with regular text. This module aims to either remove or partition out the references. For example
Add more examples as comments to this issue as they are identified.