Closed ICLRandD closed 4 years ago
You might be interested in an alternative Python implementation of Schwartz-Hearst which handles this scenario.
https://github.com/philgooch/abbreviation-extraction
E.g.
pip install abbreviations
In [1]: from abbreviations import schwartz_hearst
In [2]: schwartz_hearst.extract_abbreviation_definition_pairs(doc_text='The Proceeds of Crime Act 2002 ("PoCA 2002")')
Out[2]: {'PoCA 2002': 'Proceeds of Crime Act 2002'}
The current implementation of the
AbbreviationDetector()
does not handle abbreviations that contain a short form followed by a space followed by a numberFor example, in this scenario:
The abbreviation is not matched.
The original implementation in scispaCy does not appear to have been built to handle instances in which the short form is bounded by quote marks).