Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

Numpy Dependency is Broken; Needs Updating to Explicitly Disallow v2 #97

Closed GChampagne1 closed 3 weeks ago

GChampagne1 commented 1 month ago

Describe the bug Numpy has released a version 2, but QuickUMLS requires version 1 (various errors are thrown otherwise). The way the requirements.txt is currently written, it allows for v2 but should not.

To Reproduce Create a venv from scratch and run poetry install then attempt to do anything with quickumls.

Environment

Additional context Line 2 of requirments.txt should be changed to something like numpy = ">=1.8.2,<2.0.0"; needs to explicitly not allow version 2.

ghisvail commented 3 weeks ago

Could you provide the actual errors thrown? They could help with finding a suitable fix for NumPy 2.

Upper bound restrictions are not a suitable solution long term, as the rest of the scientific Python ecosystem will eventually transition to version 2 only onward.

GChampagne1 commented 3 weeks ago

I agree fixing the issue would be best but what I'm suggesting is a quick one line fix just to get it running in the short term.

The error I get is

  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/quickumls/__init__.py", line 1, in <module>
    from .core import QuickUMLS
  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/quickumls/core.py", line 12, in <module>
    import spacy
  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/spacy/__init__.py", line 6, in <module>
    from .errors import setup_default_warnings
  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/spacy/errors.py", line 3, in <module>
    from .compat import Literal
  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/spacy/compat.py", line 39, in <module>
    from thinc.api import Optimizer  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/thinc/api.py", line 1, in <module>
    from .backends import (
  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/thinc/backends/__init__.py", line 17, in <module>
    from .cupy_ops import CupyOps
  File "/Users/gchampagne/Library/Caches/pypoetry/virtualenvs/mag-umls-gaQ-trjz-py3.11/lib/python3.11/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
    from .numpy_ops import NumpyOps
  File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

A quick Google makes it seem like it could be an issue with spacy or some other dependency being compiled on an older (i.e. v1) version of Numpy, but I haven't read the full Numpy v1 -> v2 guide.

ghisvail commented 3 weeks ago

According to your traceback, the issue comes from thinc, a transitive dependency used though spacy. So again, the solution is not to apply an upperbound to NumPy's version, but ensure all the direct and transitive dependencies are consistent.

Now the weird thing is that if I do:

$ conda create -n spacy python=3.12
$ conda activate spacy
$ pip install spacy
Successfully installed MarkupSafe-2.1.5 annotated-types-0.7.0 blis-0.7.11 catalogue-2.0.10 certifi-2024.7.4 charset-normalizer-3.3.2 click-8.1.7 cloudpathlib-0.18.1 confection-0.1.5 cymem-2.0.8 idna-3.7 jinja2-3.1.4 langcodes-3.4.0 language-data-1.2.0 marisa-trie-1.2.0 markdown-it-py-3.0.0 mdurl-0.1.2 murmurhash-1.0.10 numpy-1.26.4 packaging-24.1 preshed-3.0.9 pydantic-2.8.2 pydantic-core-2.20.1 pygments-2.18.0 requests-2.32.3 rich-13.7.1 shellingham-1.5.4 smart-open-7.0.4 spacy-3.7.5 spacy-legacy-3.0.12 spacy-loggers-1.0.5 srsly-2.4.8 thinc-8.2.5 tqdm-4.66.5 typer-0.12.3 typing-extensions-4.12.2 urllib3-2.2.2 wasabi-1.1.3 weasel-0.4.1 wrapt-1.16.0
$ pip install quickumls
...
Requirement already satisfied: numpy>=1.8.2 in ./.local/share/micromamba/envs/spacy/lib/python3.12/site-packages (from quickumls) (1.26.4)

We can see that spacy and thinc already restrict numpy to v1.x and quickumls will happily reuse the one cached by the first install of spacy. So, in your case it looks like another dependency (or yourself) forced an update to numpy v2 instead of staying to 1.x, which quickumls / spacy / thinc are happy to stay on.

ghisvail commented 3 weeks ago

Finally, if I force an upgrade:

$ pip install numpy==2
...
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
thinc 8.2.5 requires numpy<2.0.0,>=1.19.0; python_version >= "3.9", but you have numpy 2.0.0 which is incompatible.

Perhaps clearing your cache and starting a new deployment may help.

ghisvail commented 3 weeks ago

Actually, since you're using Poetry, you may try either of these options:

  1. if you have spacy=^3.7 in your pyproject file, then poetry update spacy thinc
  2. set spacy=^3.7 in your pyproject file, then 1
  3. set numpy<2 in your pyproject file, then poetry update
GChampagne1 commented 3 weeks ago

Sorry maybe I wasn't clear; I fixed it on my/my company's setup by explicitly stating numpy should be v1. I was raising the issue because as the requirements.txt written right now I think anyone just downloading this and running it will run into a similar issue.

ghisvail commented 3 weeks ago

I fixed it on my/my company's setup by explicitly stating numpy should be v1

which was the right thing to do :+1:

I think anyone just downloading this and running it will run into a similar issue

it should be fine as shown here