MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
725 stars 68 forks source link

New pip version for numpy fix #63

Closed raffaem closed 10 months ago

raffaem commented 11 months ago

I think a new pip version should be released with the numpy fix.

I'm getting this error when running the example:

from polyfuzz import PolyFuzz

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

model = PolyFuzz("TF-IDF")
model.match(from_list, to_list)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 7
      4 to_list = ["apple", "apples", "mouse"]
      6 model = PolyFuzz("TF-IDF")
----> 7 model.match(from_list, to_list)

File ~/.pyenv/versions/3.11.4/envs/polyfuzz/lib/python3.11/site-packages/polyfuzz/polyfuzz.py:127, in PolyFuzz.match(self, from_list, to_list, top_n)
    125 if self.method in ["TF-IDF", "TFIDF"]:
    126     self.method = TFIDF(min_similarity=0, top_n=top_n)
--> 127     self.matches = {"TF-IDF": self.method.match(from_list, to_list)}
    128 elif self.method in ["EditDistance", "Edit Distance"]:
    129     self.method = RapidFuzz()

File ~/.pyenv/versions/3.11.4/envs/polyfuzz/lib/python3.11/site-packages/polyfuzz/models/_tfidf.py:91, in TFIDF.match(self, from_list, to_list, re_train)
     69 """ Match two lists of strings to each other and return the most similar strings
     70 
     71 Arguments:
   (...)
     87 ```
     88 """
     90 tf_idf_from, tf_idf_to = self._extract_tf_idf(from_list, to_list, re_train)
---> 91 matches = cosine_similarity(tf_idf_from, tf_idf_to,
     92                             from_list, to_list,
     93                             self.min_similarity,
     94                             top_n=self.top_n,
     95                             method=self.cosine_method)
     97 return matches

File ~/.pyenv/versions/3.11.4/envs/polyfuzz/lib/python3.11/site-packages/polyfuzz/models/_utils.py:91, in cosine_similarity(from_vector, to_vector, from_list, to_list, min_similarity, top_n, method)
     89     indices = _top_n_idx_sparse(similarity_matrix, top_n)
     90     similarities = _top_n_similarities_sparse(similarity_matrix, indices)
---> 91     indices = np.array(np.nan_to_num(np.array(indices, dtype=np.float), nan=0), dtype=np.int)
     93 # Faster than knn and slower than sparse but uses more memory
     94 else:
     95     similarity_matrix = scikit_cosine_similarity(from_vector, to_vector)

File ~/.pyenv/versions/3.11.4/envs/polyfuzz/lib/python3.11/site-packages/numpy/__init__.py:319, in __getattr__(attr)
    314     warnings.warn(
    315         f"In the future `np.{attr}` will be defined as the "
    316         "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
    318 if attr in __former_attrs__:
--> 319     raise AttributeError(__former_attrs__[attr])
    321 if attr == 'testing':
    322     import numpy.testing as testing

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
raffaem commented 11 months ago

In the meantime you can install from git:

pip3 install git+https://github.com/MaartenGr/PolyFuzz
MaartenGr commented 10 months ago

Thanks! I'll make sure to update the version.

MaartenGr commented 10 months ago

Done, just released it!

raffaem commented 10 months ago

ok thanks