aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.
https://aphp.github.io/edsnlp/
BSD 3-Clause "New" or "Revised" License
112 stars 29 forks source link

Umls fixes #183

Closed percevalw closed 1 year ago

percevalw commented 1 year ago

Description

This PR disables EDSMatcher preprocessing auto progress tracking by default, caps spaCy version < 4.0 due to binary incompatibilities and tries to allow public forks to access the CI cache (especially UMLS resource files).

Checklist

percevalw commented 1 year ago

I merged this a bit fast sorry.

Github prevents secrets from being used when running the CI for PR from forks, but testing with the UMLS requires the UMLS secret API key to download the resource files. This is why all external contributions have failed the CI since the UMLS matcher has been added. For this reason, and to avoid downloading the UMLS each time, I decided to cache the resources between runs, thinking public forks would benefit from this cache as well and bypass the UMLS_API_KEY requirement. But it seems that putting a secret at the top of an action makes the caches unaccessible for public forks, so I moved the secret interpolation to the scripts that require it only, hence the duplication.

And show_progress is better indeed, we can fix this later 👍