allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.66k stars 223 forks source link

Different results between `Scispacy - Demo` and my code for the same sentence #475

Closed jin-deng closed 1 year ago

jin-deng commented 1 year ago

Hello everyone,

text = "we do expect (06:27) submitted for Rayaldee in Europe to get those approvals and launch the product in Europe this year. "

Scispacy - Demo works well. The NER model Scispacy - Demo produce desired outputs when I choose en_ner_bc5cdr_md model because Rayaldee is correctly classified as ENTITY with label CHEMICAL.

Rayaldee CHEMICAL 8 9 35 43


However, when I replicate it locally, Rayaldee is classified as PROPN. There is no ENTITY identified. Does anyone has idea of why this is the case? Thanks a lot!

import scispacy
import spacy
ner = spacy.load("en_ner_bc5cdr_md")

text  = "we do expect (06:27) submitted for Rayaldee in Europe to get those approvals and launch the product in Europe this year. "

doc = ner(text) 

for token in doc:
    print(token.text, token.pos_, token.ent_type_)

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)
we PRON 
do AUX 
expect VERB 
( PUNCT 
06:27 NUM 
) PUNCT 
submitted VERB 
for ADP 
Rayaldee PROPN 
in ADP 
Europe PROPN 
to PART 
get VERB 
those DET 
approvals NOUN 
and CCONJ 
launch VERB 
the DET 
product NOUN 
in ADP 
Europe PROPN 
this DET 
year NOUN 
. PUNCT 
dakinggg commented 1 year ago

Hi @JinDenguchicago the demo uses an older version of scispacy, and I don't have access to the demo page any more to update it to the latest scispacy version. This means that the results will be different between the demo page and the latest scispacy version. Please see https://github.com/allenai/scispacy/issues/342 for more discussion. I will also update the readme ot note this. As for why the newer version is "worse", I don't know the answer. It might just be randomness on this examples. On our eval sets the results were similar, but it is entirely possible that something changed for the worse between spacy 2.x and spacy 3.x training.

jin-deng commented 1 year ago

Hi @dakinggg, Thanks so much for your explanation! Do you know where I could get previous version link as discussion in https://github.com/allenai/scispacy/issues/342#issuecomment-804993320 did? I want to try pervious NER model en_ner_bc5cdr_md.

Second EDIT: (I just tried the pip install with specific version link and it works)