allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.66k stars 223 forks source link

NER-classes: Where to find definitions? #500

Closed raven44099 closed 7 months ago

raven44099 commented 8 months ago

Hi, I'm an enthusiastic user of your library!

On your website you state that there is a GGP class in this dataset.

Model F1 Entity Types
en_ner_craft_md 77.56 GGP, SO, TAXON, CHEBI, GO, CL

However, The original CRAFT-paper doesn't have this class:

Terminology Total Annotations
ChEBI 8,137
CL 5,760
Entrez Gene 12,277
GO BPa 16,184
GO CC 8,354/4,707b
GO MF 4,062
NCBITaxonc 7,449
PRO 15,594
SOd 22,090
All 99,907

I tried to find the mapping, but was not successful. Where can I find information about the definitions of the classes used for your NER models?

dakinggg commented 7 months ago

I believe this should contain the information you are looking for: https://github.com/cambridgeltl/MTL-Bioinformatics-2016/blob/master/Additional%20file%201.pdf. GGP specifically is gene/gene-product