allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.68k stars 225 forks source link

Is it possible to pre-download data file for UMLS? #454

Closed rjiang9 closed 1 year ago

rjiang9 commented 1 year ago

In the UMLS example code of scispacy, the comments say, it will have to download ~1GB of data and load a large JSON file (the knowledge base).

I am wondering whether the data file(s) can be downloaded before hand. If it is possible, what are the files to download? where to put them? The reason I ask is my server where I run scispacy program does not allow me to download the data file(s) from Internet. but I can download them to my work station in advance and update to the server.

Thanks.

FYI, Part of snippet code from the example is:

# This line takes a while, because we have to download ~1GB of data
# and load a large JSON file (the knowledge base). Be patient!
# Thankfully it should be faster after the first time you use it, because
# the downloads are cached.
# NOTE: The resolve_abbreviations parameter is optional, and requires that
# the AbbreviationDetector pipe has already been added to the pipeline. Adding
# the AbbreviationDetector pipe and setting resolve_abbreviations to True means
# that linking will only be performed on the long form of abbreviations.

nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls"})
dakinggg commented 1 year ago

Hi @rjiang9, sorry this is not easier, but it should be quite possible! Please check out #343 and try to follow the suggestions there. Let me know if that works for you!

dakinggg commented 1 year ago

Closing this since it should be possible via #343, but feel free to reopen if you have questions!