A: Indirectly, disambiguate a name for a bioentity (e.g. gene) more accurately
Q: What information must the user provide to use the feature?
A: (1) Article information (2) names of bioentities
Q: What are the applicable constraints, e.g. compatibility or performance?
A: There main cases to consider:
Default: No prior information is available
Bioentity database identifiers are available
Species information is available
Q: How does this feature affect each class of user (persona)?
A: Synonyms and orthologues account for a large proportion of observed errors (30%). It is conceivable that other types of errors could be mitigated (e.g. spelling issues) and that hints would enable features such as a true "type-ahead" autocomplete.
Uses
curation: in normalization
post-submission: in an automated error flagging system (even if not available at curation time)
triage: e.g. classifier to more accurately identify potential articles and authors
information extraction: e.g. context provided by authors
Users
Biologist: Eventually, better search across deposited data, better discovery
Editor: Increased quality and trust in the accuracy of Biofactoid data
Computational biologist: Increased fidelity of Biofactoid data, better data integration
Curator: Increased fidelity of Biofactoid curation
Specification
Sources of bioentity information
Considerations
Entity types
Consistent concepts (gene product, family)
Compatible Identifiers
Scope
Accuracy (curated vs NLP)
Format (file, web service)
Latency (seconds)
Hardware (GPU)
Providers
Curated
PubMed
Natural Language Processing
PubTator3
Reach
Scoring algorithm
This is to be determined. Should consider:
Location: Prioritization based on mention in title vs abstract vs body
Type: Local hint (e.g. entity database IDs) vs global (e.g. species)
Reliability of source
Tasks
The factoid project should be responsible solely for obtaining bioentity hints for a given article:
[x] Define a Hint model
[x] PubTator3 Hint provider
[ ] Organism Hint ranking
[ ] General Hints API
[ ] Retrieve and store Hints on create/update of Document
[ ] Augment grounding-search query with Hints
At least for network curation, grounding-search should be responsible for scoring search hits in light of hints.
References
Entity normalization
Chen, L., Liu, H. & Friedman, C. Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21, 248–256 (2005)
Gyori, B. M. et al. Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinform Adv 2, (2022)
Wei, C.-H. et al. GNorm2: an improved gene name recognition and normalization system. Bioinformatics 39, btad599 (2023)
Entity identification
Luo, L. et al. AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning. Bioinformatics 39, (2023)
Species
Pafilis, E. et al. The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PLoS ONE 8, e65390 (2013)
Wei, C.-H. et al. SR4GN: A Species Recognition Software Tool for Gene Normalization. PLoS ONE 7, e38460 (2012)
Luo, L. et al. Assigning species information to corresponding genes by a sequence labeling framework. Database 2022, baac090 (2022)
Applications
Wei, C.-H. et al. PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge. arXiv (2024)
Description
Q: What is the name of the feature?
A: Grounding Assist
Q: What does this feature enable the user to do?
A: Indirectly, disambiguate a name for a bioentity (e.g. gene) more accurately
Q: What information must the user provide to use the feature?
A: (1) Article information (2) names of bioentities
Q: What are the applicable constraints, e.g. compatibility or performance?
A: There main cases to consider:
Q: How does this feature affect each class of user (persona)?
A: Synonyms and orthologues account for a large proportion of observed errors (30%). It is conceivable that other types of errors could be mitigated (e.g. spelling issues) and that hints would enable features such as a true "type-ahead" autocomplete.
Uses
Users
Specification
Sources of bioentity information
Scoring algorithm
This is to be determined. Should consider:
Tasks
The factoid project should be responsible solely for obtaining bioentity hints for a given article:
Hint
modelHint
providerHint
rankingHint
s APIHint
s on create/update ofDocument
Hint
sAt least for network curation, grounding-search should be responsible for scoring search hits in light of hints.
References