allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.71k stars 229 forks source link

Entity Linker takes a while to process #390

Closed farrandi closed 3 years ago

farrandi commented 3 years ago

I tried using the entity linker with UMLS from scispacy and it takes a while to load (for the first time) ~ around 14s. The second time I run it is noticeably faster ~ 30 ms. I assume its cached?

Here is the code I ran:

%%time

doc = nlp("arrhythmia")
spacy.displacy.render(doc, style = "ent", jupyter = True)

entity = doc.ents[0]
print("Name: ", entity)

linker = nlp.get_pipe("scispacy_linker")
for umls_ent in entity._.kb_ents:
    print(umls_ent)
    print(linker.kb.cui_to_entity[umls_ent[0]])
    print("----------------------")

and it outputted:

..(the printed values)...
CPU times: user 0 ns, sys: 297 ms, total: 297 ms
Wall time: 13.8 s

Is this normal? Is there a way to make the results appear faster as in the demo in Streamlit, when I type in words in the textbox, the results seem to come out instantaneously. How is it done there?

dakinggg commented 3 years ago

the first time is slow because it needs to download a bunch of files. it should be faster all subsequent times.