Open davidshumway opened 2 years ago
I think you actually can do this, although admittedly I have not tried it. Can you try setting the SCISPACY_CACHE
environment variable (used on this line https://github.com/allenai/scispacy/blob/3d153ddad1f11f000f961f7a92c0d862b93c0973/scispacy/file_cache.py#L16) to whatever folder you want to use, before importing the library?
Makes sense.
So it seems to pretty much be working with a bit of a workaround.
The files are initially cached to /root/.scispacy/datasets/
.
After caching, move the cache folder to a permanent folder on Google drive:
!mv /root/.scispacy/ /content/gdrive/MyDrive/test/
!ls /content/gdrive/MyDrive/test/.scispacy/
>>> datasets
To update the environment variable, as described:
import os
os.environ['SCISPACY_CACHE'] = '/content/gdrive/MyDrive/test/.scispacy/'
However, this alone does not find the cached files. It will re-download the files again. In order to see the new environment variable, it's necessary to restart the runtime: Runtime->Restart runtime
.
Now when running the entity linker, it will see the permanently cached files.
So is an enhancement necessary? It'd definitely be easier and more foolproof to simply add a parameter such as cache_folder
to the nlp.add_pipe()
method. For example:
nlp.add_pipe(
"scispacy_linker",
config={
"resolve_abbreviations": True,
"linker_name": "umls",
"cache_folder": "/content/gdrive/MyDrive/test/"})
which would then be used to look for a subfolder .scispacy
, i.e. /content/gdrive/MyDrive/test/.scispacy/
in this case.
https://github.com/allenai/scispacy/blob/2290a80cfe0948e48d8ecfbd60064019d57a6874/scispacy/file_cache.py#L16
For Google Colab users, the
Path.home()
location is/root/
, which is deleted when the runtime is cleared. As runtimes are cleared fairly often, this means re-downloading the KBs. Perhaps there is a way to alterPath.home
frompathlib
? Another option is to allow the user to enter a cache folder, which Colab users could set to their Google Drive (fwiw just a regular folder as seen by python within Colab), thus making the download permanent.