allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.72k stars 229 forks source link

Abbreviation detector doesn't work properly #406

Closed veilupt closed 2 years ago

veilupt commented 2 years ago

Abbreviation doesn't happen properly if the input is as follows in scispacy

(SBMA) is a gradually progressive neuromuscular disorder in which degeneration of lower motor neurons results in muscle weakness, muscle atrophy, and fasciculations. SBMA occurs only in males.

It works as expected if the input is as follows

Spinal and bulbar muscular atrophy (SBMA) is a gradually progressive neuromuscular disorder in which degeneration of lower motor neurons results in muscle weakness, muscle atrophy, and fasciculations. SBMA occurs only in males.

dakinggg commented 2 years ago

This is working as intended. An abbreviation can only be detected if both the long form and short form exist in the text.

veilupt commented 2 years ago

@dakinggg We were under the impression that given an abbreviation it would provide the expanded form automatically and this is what we are looking to achieve in order to normalize that data and make it more understandable. Is there some other component that would enable such behavior?