dedupeio / dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
https://docs.dedupe.io
MIT License
3.99k stars 543 forks source link

Error installing on python 3.9? #981

Closed rderidder-lda closed 2 years ago

rderidder-lda commented 2 years ago

Hello.. maybe i'm missing a dependency? Sorry.. I'm not incredibly great at environment setup.. I had this up and running, in my python 3.8 environment, and just tried to move to 3.9. ..I'm assuming the error ' error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ ' is a bit of a red herring (i did try and install it anyway, but didn't make any difference).. I couldn't find any install document on dedupe.io 's site that mentioned anything special.

Could it be related to my pycharm IDE / any other dependency the package has? I found a few similar issues online, but none related to python 3.9..

Collecting dedupe==2.0.13 Using cached dedupe-2.0.13.tar.gz (70 kB) Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Preparing metadata (pyproject.toml): finished with status 'done' Requirement already satisfied: numpy>=1.13 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe==2.0.13) (1.22.3) Collecting Levenshtein-search==1.4.5 Using cached Levenshtein_search-1.4.5.tar.gz (7.8 kB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'done' Collecting dedupe-hcluster Using cached dedupe_hcluster-0.3.9-cp39-cp39-win_amd64.whl (170 kB) Collecting haversine>=0.4.1 Using cached haversine-2.5.1-py2.py3-none-any.whl (6.1 kB) Requirement already satisfied: typing-extensions in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe==2.0.13) (4.1.1) Requirement already satisfied: zope.index in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe==2.0.13) (5.1.0) Requirement already satisfied: rlr>=2.4.3 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe==2.0.13) (2.4.6) Collecting doublemetaphone Using cached DoubleMetaphone-1.1-cp39-cp39-win_amd64.whl (28 kB) Collecting categorical-distance>=1.9 Using cached categorical_distance-1.9-py3-none-any.whl (3.3 kB) Requirement already satisfied: BTrees>=4.1.4 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe==2.0.13) (4.10.0) Collecting fastcluster Using cached fastcluster-1.2.6-cp39-cp39-win_amd64.whl (36 kB) Collecting dedupe-variable-datetime Using cached dedupe_variable_datetime-0.1.5-py3-none-any.whl (4.8 kB) Collecting affinegap>=1.3 Using cached affinegap-1.12-cp39-cp39-win_amd64.whl (16 kB) Requirement already satisfied: simplecosine>=1.2 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe==2.0.13) (1.2) Collecting highered>=0.2.0 Using cached highered-0.2.1-py2.py3-none-any.whl (3.3 kB) Requirement already satisfied: persistent>=4.1.0 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from BTrees>=4.1.4->dedupe==2.0.13) (4.9.0) Requirement already satisfied: zope.interface>=5.0.0 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from BTrees>=4.1.4->dedupe==2.0.13) (5.4.0) Requirement already satisfied: pyhacrf-datamade>=0.2.0 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from highered>=0.2.0->dedupe==2.0.13) (0.2.6) Requirement already satisfied: pylbfgs in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from rlr>=2.4.3->dedupe==2.0.13) (0.2.0.14) Requirement already satisfied: future in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe-variable-datetime->dedupe==2.0.13) (0.18.2) Requirement already satisfied: datetime-distance in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from dedupe-variable-datetime->dedupe==2.0.13) (0.1.3) Requirement already satisfied: six in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from zope.index->dedupe==2.0.13) (1.16.0) Requirement already satisfied: setuptools in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from zope.index->dedupe==2.0.13) (57.0.0) Requirement already satisfied: cffi in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from persistent>=4.1.0->BTrees>=4.1.4->dedupe==2.0.13) (1.15.0) Requirement already satisfied: python-dateutil>=2.6.0 in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from datetime-distance->dedupe-variable-datetime->dedupe==2.0.13) (2.8.2) Requirement already satisfied: pycparser in c:\users\xxx\pycharmprojects\adm-backend\venv\lib\site-packages (from cffi->persistent>=4.1.0->BTrees>=4.1.4->dedupe==2.0.13) (2.21)

** Building wheels for collected packages: dedupe, Levenshtein-search Building wheel for dedupe (pyproject.toml): started Building wheel for dedupe (pyproject.toml): finished with status 'error' Building wheel for Levenshtein-search (setup.py): started Building wheel for Levenshtein-search (setup.py): finished with status 'error' Running setup.py clean for Levenshtein-search Failed to build dedupe Levenshtein-search

** error: subprocess-exited-with-error

Building wheel for dedupe (pyproject.toml) did not run successfully. exit code: 1

[42 lines of output] running bdist_wheel running build running build_py creating build creating build\lib.win-amd64-3.9 creating build\lib.win-amd64-3.9\dedupe copying dedupe\api.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\backport.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\blocking.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\canonical.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\canopy_index.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\clustering.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\convenience.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\core.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\datamodel.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\index.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\labeler.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\levenshtein.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\predicates.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\sampling.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\serializer.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\tfidf.py -> build\lib.win-amd64-3.9\dedupe copying dedupe\training.py -> build\lib.win-amd64-3.9\dedupe copying dedupe_init.py -> build\lib.win-amd64-3.9\dedupe copying dedupe_typing.py -> build\lib.win-amd64-3.9\dedupe copying dedupe__init__.py -> build\lib.win-amd64-3.9\dedupe creating build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\base.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\categorical_type.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\exact.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\exists.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\interaction.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\latlong.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\price.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\set.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables\string.py -> build\lib.win-amd64-3.9\dedupe\variables copying dedupe\variables__init__.py -> build\lib.win-amd64-3.9\dedupe\variables warning: build_py: byte-compiling is disabled, skipping.

** running build_ext building 'dedupe.cpredicates' extension error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ [end of output]

**

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for dedupe error: subprocess-exited-with-error

python setup.py bdist_wheel did not run successfully. exit code: 1

[5 lines of output] running bdist_wheel running build running build_ext building 'Levenshtein_search' extension error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for Levenshtein-search ERROR: Could not build wheels for dedupe, which is required to install pyproject.toml-based projects

rderidder-lda commented 2 years ago

in case it makes a difference, i am using numpty 1.22.3 should i downgrade that version to something like 1.19? (just saw some hints on other issues that maybe this is it)

fgregg commented 2 years ago

it doesn't have anything to do with numpy, as the pieces that are not building don't depend on numpy.

what is the output if you install the Microsoft Visual C++ 14.0 tools?

rderidder-lda commented 2 years ago

Thanks for getting back to me! appreciate the time.

To be precise, I ran the installation using "visualcppbuildtools_full.exe" which I downloaded from MS. Inside that installation application, I selected 2 components. MS build tool isntall I restarted the machine and reran the dedupe install. I get the identical output as above.

What is more curious is that it ran with python 3.8 fine.. So am I missing something around the fact that this Microsoft install needs to somehow be referenced in the specific virtual environment that runs in my IDE? I use pycharm.. and change my interpreter from python 3.8 to 3.9. Re install all libraries. but this one is failing. I did have to upgrade Flask to the latest for it to work. And I upgraded pip itself to the latest just incase.

I suppose this is specific to the Levenshtien-search library, and not dedupe itself. I have tried just installing that library only, and I get the same error.. Thanks for any tips at all... R

rderidder-lda commented 2 years ago

finding https://wiki.python.org/moin/WindowsCompilers#Microsoft_Visual_C.2B-.2B-_14.2_standalone:_Build_Tools_for_Visual_Studio_2019_.28x86.2C_x64.2C_ARM.2C_ARM64.29 i selected a couple more components and restarted. image

Same error appears. I'm thinking since it worked for python 3.8, i shouldn't have to install anything outside the venv just because i changed to python 3.9.. strange.

fgregg commented 2 years ago

you need to use the same compiler that was used to compile your version of python. that might have changed between 3.8 and 3.9

rderidder-lda commented 2 years ago

lol.. i'm a bit of a newbie.. i'm reading that sentence over and over trying to understand it.. my 'compiler' is pycharm.. in pycharm i can switch between 2 interpreters being used (python 3.8 and 3.9). When i'm in 3.8, the environment has all the libraries installed and working great. When i switch to 3.9, I can't get past the install of the dedupe library (apparently due to the levenshtein-search dependency having this install issue)... so i'm not sure where exactly you are mentioning something i can control.. I've tried uninstalling python 3.9, and reinstalling / re-setting up the pycharm 3.9 interpreter and environment.. but no change..

fgregg commented 2 years ago

i'm sorry, you are having trouble getting everything set up, but i think i can't help you further get your environment set up.

this is a good argument for #976, so that we can provide binaries for more platforms.

rderidder-lda commented 2 years ago

if i understand it right.. levenshtein-search provided a wheel for python 3.8, and so no compiling was needed when it was installed.. but they do not supply a wheel for 3.9.. so in order to compile it, we need a certain number of things installed for MS C++. I'm not sure what exactly those things are.. as above, I tried 4 of the modules per the MS website.. but maybe more are required? Maybe other folks do not report this issue because they already had these MS items installed and so the wheel is compiled without them knowing? However after much head-banging, i do believe this is not really a local issue, but an issue with the fact that levenshtein-search does not have a 3.9 wheel.. please do correct me if i'm wrong here. I will now try and install more MS modules to see if i can get this library to compile..

rderidder-lda commented 2 years ago

if anyone does have a 3.9 wheel for windows 10 64bit, i'd love to download it.. thanks

rderidder-lda commented 2 years ago

Success.. i added a bunch more modules from MS.. and it can now compile. Likely can narrow this list further, but its better than the entire VS: MS build tool isntall