allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.66k stars 223 forks source link

Resources for Abbreviation Disambiguation in Scispacy? #517

Closed dagmawinegesse12 closed 3 weeks ago

dagmawinegesse12 commented 1 month ago

Hello Scispacy Team,

I am currently exploring the use of Scispacy for processing medical texts and am particularly interested in the AbbreviationDetector component. I am looking to understand better the resources used for disambiguation of abbreviations, such as dictionaries or other structured forms that might be involved in this process.

Could you provide details on the following:

  1. Does Scispacy utilize a specific dictionary or database for mapping abbreviations to their expanded forms?

  2. If such a resource exists, is it available for review or export? I am interested in examining how comprehensive and up-to-date it is.

  3. Additionally, any guidance on how the system handles disambiguation of abbreviations in varied contexts would be greatly appreciated.

Thank you in advance!!

dakinggg commented 3 weeks ago

The entire abbreviation detection code is here (https://github.com/allenai/scispacy/blob/main/scispacy/abbreviation.py). There are no dictionary components.