anhaidgroup / py_stringmatching

A comprehensive and scalable set of string tokenizers and similarity measures in Python
https://sites.google.com/site/anhaidgroup/projects/py_stringmatching
BSD 3-Clause "New" or "Revised" License
135 stars 16 forks source link

chore: :wrench: pin `numpy<2.0` #100

Closed odulcy-mindee closed 1 month ago

odulcy-mindee commented 2 months ago

Hi,

There was a numpy release today. Problem is that it's now impossible to compile py_stringmatching.

You can reproduce this problem using the following steps:

docker run -it python:3.11-slim-bullseye bash
# In the container
apt-update && apt install -y gcc
pip3 install py_stringmatching

It will fail with the following error:

Collecting py_stringmatching
  Using cached py-stringmatching-0.4.5.tar.gz (849 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting numpy>=1.7.0 (from py_stringmatching)
  Obtaining dependency information for numpy>=1.7.0 from https://files.pythonhosted.org/packages/d1/27/2a7bd6855dc717aeec5f553073a3c426b9c816126555f8e616392eab856b/numpy-2.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Using cached numpy-2.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Using cached numpy-2.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.3 MB)
Building wheels for collected packages: py_stringmatching
  Building wheel for py_stringmatching (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for py_stringmatching (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [85 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-311
      creating build/lib.linux-x86_64-cpython-311/py_stringmatching
      copying py_stringmatching/__init__.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching
      copying py_stringmatching/utils.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching
      creating build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/whitespace_tokenizer.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/alphabetic_tokenizer.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/__init__.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/alphanumeric_tokenizer.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/delimiter_tokenizer.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/tokenizer.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/definition_tokenizer.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      copying py_stringmatching/tokenizer/qgram_tokenizer.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tokenizer
      creating build/lib.linux-x86_64-cpython-311/py_stringmatching/tests
      copying py_stringmatching/tests/__init__.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tests
      copying py_stringmatching/tests/utils.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tests
      copying py_stringmatching/tests/test_tokenizers.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tests
      copying py_stringmatching/tests/test_simfunctions.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tests
      copying py_stringmatching/tests/test_sim_Soundex.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/tests
      creating build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/dice.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/similarity_measure.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/partial_ratio.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/editex.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/token_similarity_measure.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/partial_token_sort.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/soft_tfidf.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/__init__.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/monge_elkan.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/jaro_winkler.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/phonetic_similarity_measure.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/sequence_similarity_measure.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/hamming_distance.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/ratio.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/token_sort.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/tfidf.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/affine.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/smith_waterman.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/hybrid_similarity_measure.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/cosine.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/bag_distance.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/soundex.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/jaccard.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/tversky_index.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/generalized_jaccard.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/overlap_coefficient.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/jaro.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/needleman_wunsch.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      copying py_stringmatching/similarity_measure/levenshtein.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      creating build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      copying py_stringmatching/similarity_measure/cython/__init__.py -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      running egg_info
      writing py_stringmatching.egg-info/PKG-INFO
      writing dependency_links to py_stringmatching.egg-info/dependency_links.txt
      writing requirements to py_stringmatching.egg-info/requires.txt
      writing top-level names to py_stringmatching.egg-info/top_level.txt
      reading manifest file 'py_stringmatching.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no files found matching 'requirements.txt'
      adding license file 'LICENSE'
      writing manifest file 'py_stringmatching.egg-info/SOURCES.txt'
      copying py_stringmatching/similarity_measure/cython/cython_affine.c -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      copying py_stringmatching/similarity_measure/cython/cython_jaro.c -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      copying py_stringmatching/similarity_measure/cython/cython_jaro_winkler.c -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      copying py_stringmatching/similarity_measure/cython/cython_levenshtein.c -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      copying py_stringmatching/similarity_measure/cython/cython_needleman_wunsch.c -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      copying py_stringmatching/similarity_measure/cython/cython_smith_waterman.c -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      copying py_stringmatching/similarity_measure/cython/cython_utils.c -> build/lib.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      running build_ext
      <string>:32: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
      building 'py_stringmatching.similarity_measure.cython.cython_levenshtein' extension
      creating build/temp.linux-x86_64-cpython-311
      creating build/temp.linux-x86_64-cpython-311/py_stringmatching
      creating build/temp.linux-x86_64-cpython-311/py_stringmatching/similarity_measure
      creating build/temp.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython
      gcc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/tmp/pip-build-env-_6gd7225/normal/lib/python3.11/site-packages/numpy/core/include -I/opt/python/include/python3.11 -c py_stringmatching/similarity_measure/cython/cython_levenshtein.c -o build/temp.linux-x86_64-cpython-311/py_stringmatching/similarity_measure/cython/cython_levenshtein.o
      py_stringmatching/similarity_measure/cython/cython_levenshtein.c:1205:10: fatal error: numpy/arrayobject.h: No such file or directory
       1205 | #include "numpy/arrayobject.h"
            |          ^~~~~~~~~~~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for py_stringmatching
Failed to build py_stringmatching
ERROR: Could not build wheels for py_stringmatching, which is required to install pyproject.toml-based projects
odulcy-mindee commented 2 months ago

cc @Anson-Doan, @anhaidgroup

twall commented 2 months ago

I've been recently seeing this issue in my automated builds. Trying to install both numpy and py_stringmatching at the same time fails with this error.

python -m pip install numpy==1.26.2 py_stringmatching==0.4.5

My temporary fix is to do the following:

python -m pip install numpy==1.26.2
python -m pip install --global-option=build_ext --global-option="-I$(python -c 'import numpy;print(numpy.get_include())')" py_stringmatching==0.4.5

This doesn't seem to be an issue in some other environments not building from scratch, but I'm assuming that's b/c numpy is already installed or the required includes present (pip install after pip uninstall worked in those cases).

EDIT

I just tried this again, omitting the extra build arguments.

python -m pip install numpy==1.26.2 py_stringmatching==0.4.5  # <-- FAILS

python -m pip install numpy==1.26.2
python -m pip install py_stringmatching==0.4.5  # <-- SUCCEEDS