allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.72k stars 229 forks source link

scipy pin preventing Python 3.12 support? #519

Closed jason-nance closed 3 weeks ago

jason-nance commented 4 months ago

Hi, I'm attempting to upgrade some of our code to Python 3.12, and I've noticed a conflict involving scispacy that I think prevents installing it on Python 3.12:

Any idea if it's possible to relax the scipy constraint to allow for running this on Python 3.12? Thanks!

dakinggg commented 4 months ago

So, on later versions of scipy this code (https://github.com/allenai/scispacy/blob/021fe76d69b20523d3f94a08b447c27e1a46597e/scispacy/candidate_generation.py#L451-L453) errors with ValueError: Output dtype not compatible with inputs.. This is only required for the code used to create the linkers, not actually for running, so the library should actually function fine with a later version of scipy. But I'd need to dig a bit more to see how to actually resolve this issue.

dakinggg commented 4 months ago

Can be reproduced locally by running pytest tests/ with scipy>=1.11 installed

dakinggg commented 4 months ago

It seems like they don't support this, so may take a bit of work to decide what to do...https://github.com/scipy/scipy/issues/7408

jason-nance commented 4 months ago

Thanks for the quick response! Yikes... glad it's a relatively small incompatibility, but that's a tricky one.

We use a pretty strict build system, so I can't override the constraint even if the library works OK with newer scipy. I'll attempt to find a solution for the float16 issue if upgrading becomes a blocker for us.

dakinggg commented 4 months ago

I think I'd be fine removing the pin from the library and just adding a check on that code path (so you'd have to downgrade in order to recreate the linkers). Of course would prefer a less brittle solution.

adelavega commented 3 months ago

I'm not able to replicate a functional environment for CandidateGenerator, even using the instructions for a 3.9 conda environment. Strangely this was working recently so I'm not sure if something else changed.

conda create -n scispacy2 python=3.9
conda activate scispacy2
pip install scispacy
>>> from scispacy.candidate_generation import CandidateGenerator
>>> generator = CandidateGenerator(name='umls')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alejandro/miniconda3/envs/scispacy2/lib/python3.9/site-packages/scispacy/candidate_generation.py", line 221, in __init__
    self.ann_index = ann_index or load_approximate_nearest_neighbours_index(
  File "/home/alejandro/miniconda3/envs/scispacy2/lib/python3.9/site-packages/scispacy/candidate_generation.py", line 132, in load_approximate_nearest_neighbours_index
    concept_alias_tfidfs = scipy.sparse.load_npz(
  File "/home/alejandro/miniconda3/envs/scispacy2/lib/python3.9/site-packages/scipy/sparse/_data.py", line 72, in astype
    self._deduped_data().astype(dtype, casting=casting, copy=copy),
  File "/home/alejandro/miniconda3/envs/scispacy2/lib/python3.9/site-packages/scipy/sparse/_data.py", line 32, in _deduped_data
    self.sum_duplicates()
  File "/home/alejandro/miniconda3/envs/scispacy2/lib/python3.9/site-packages/scipy/sparse/_compressed.py", line 1118, in sum_duplicates
    self.sort_indices()
  File "/home/alejandro/miniconda3/envs/scispacy2/lib/python3.9/site-packages/scipy/sparse/_compressed.py", line 1164, in sort_indices
    _sparsetools.csr_sort_indices(len(self.indptr) - 1, self.indptr,
ValueError: Output dtype not compatible with inputs.

Edit: it's possible this was due to a linking issue w/ c libraries. Yet, it highlights that this incompatibility makes the environment quite brittle.

ulc0 commented 2 months ago

So, on later versions of scipy this code (

https://github.com/allenai/scispacy/blob/021fe76d69b20523d3f94a08b447c27e1a46597e/scispacy/candidate_generation.py#L451-L453

) errors with ValueError: Output dtype not compatible with inputs.. This is only required for the code used to create the linkers, not actually for running, so the library should actually function fine with a later version of scipy. But I'd need to dig a bit more to see how to actually resolve this issue.

I get the error with scipy==1.10.1 on Azure Databricks