ArtLabss / open-data-anonymizer

Python Data Anonymization & Masking Library For Data Science Tasks
https://www.artlabs.tech
BSD 3-Clause "New" or "Revised" License
234 stars 29 forks source link

[BUG]: #28

Open KrishPatel13 opened 1 year ago

KrishPatel13 commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

My script:

from anonympy.pdf import pdfAnonymizer

# need to specify paths, since I don't have them in system variables
anonym = pdfAnonymizer(
    path_to_pdf="embedded_text.pdf",
    pytesseract_path=r"C:\Program Files\Tesseract-OCR\tesseract.exe",
    poppler_path=r"C:\Users\Krish Patel\Downloads\poppler-23.07.0\Library\bin",
)

# Calling the generic function
anonym.anonymize(
    output_path="output.pdf", remove_metadata=True, fill="red", outline="black"
)

I already have downloaded both poppler and tesseract.

Then I ran

pip install anonympy -> Ran successfully

Then I ran the script and got the below error:

(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction> python .\test_open_data_anony_pdf.py
Traceback (most recent call last):
  File "P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction\test_open_data_anony_pdf.py", line 1, in <module>
    from anonympy.pdf import pdfAnonymizer
  File "C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\anonympy\__init__.py", line 1, in <module>
    from anonympy import pandas
  File "C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\anonympy\pandas\__init__.py", line 6, in <module>
    from anonympy.pandas.core_pandas import dfAnonymizer
  File "C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\anonympy\pandas\core_pandas.py", line 6, in <module>
    from cape_privacy.pandas import dtypes
ModuleNotFoundError: No module named 'cape_privacy'
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction> 

Then I ran pip install cape-dataframes -> It ran successfully.

Then on running pip install cape-privacy it gave the following error:

(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction> pip install cape-privacy
 ...
 /Tcnumpy\core\src\multiarray\scalarapi.c /Fobuild\temp.win-amd64-3.10\Release\numpy\core\src\multiarray\scalarapi.obj
        C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DNPY_INTERNAL_BUILD=1 -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Inumpy\core\include -Ibuild\src.win-amd64-3.1\numpy\core\include/numpy -Inumpy\core\src\private -Inumpy\core\src -Inumpy\core -Inumpy\core\src\npymath -Inumpy\core\src\multiarray -Inumpy\core\src\umath -Inumpy\core\src\npysort -I"C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\include" -I"C:\Program Files\Python310\include" -I"C:\Program Files\Python310\Include" -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\ATLMFC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -I"C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" /Tcbuild\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.c /Fobuild\temp.win-amd64-3.10\Release\build\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.obj
        error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DNPY_INTERNAL_BUILD=1 -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Inumpy\core\include -Ibuild\src.win-amd64-3.1\numpy\core\include/numpy -Inumpy\core\src\private -Inumpy\core\src -Inumpy\core -Inumpy\core\src\npymath -Inumpy\core\src\multiarray -Inumpy\core\src\umath -Inumpy\core\src\npysort -I"C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\include" -I"C:\Program Files\Python310\include" -I"C:\Program Files\Python310\Include" -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\ATLMFC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -I"C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" /Tcbuild\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.c /Fobuild\temp.win-amd64-3.10\Release\build\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.obj" failed with exit status 2
        scalartypes.c
        numpy\core\include\numpy/npy_3kcompat.h(198): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\common.h(269): warning C4244: 'return': conversion from 'npy_intp' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(482): warning C4996: 'PyUnicode_AsUnicode': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(483): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(488): warning C4996: 'PyUnicode_FromUnicode': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(482): warning C4996: 'PyUnicode_AsUnicode': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(483): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(488): warning C4996: 'PyUnicode_FromUnicode': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(516): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(517): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(1912): warning C4244: 'function': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(1912): warning C4244: 'function': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(1866): warning C4996: 'PyUnicode_AsUnicode': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(1867): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
        numpy\core\src\multiarray\scalartypes.c.src(1871): warning C4996: 'PyObject_AsReadBuffer': deprecated in 3.0
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2788): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(2788): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
        numpy\core\src\multiarray\scalartypes.c.src(3228): error C2440: 'function': cannot convert from 'double' to 'PyObject *'   
        numpy\core\src\multiarray\scalartypes.c.src(3228): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
        numpy\core\src\multiarray\scalartypes.c.src(3228): error C2198: '_Py_HashDouble': too few arguments for call
        numpy\core\src\multiarray\scalartypes.c.src(3237): error C2440: 'function': cannot convert from 'double' to 'PyObject *'   
        numpy\core\src\multiarray\scalartypes.c.src(3237): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
        numpy\core\src\multiarray\scalartypes.c.src(3236): error C2198: '_Py_HashDouble': too few arguments for call
        numpy\core\src\multiarray\scalartypes.c.src(3243): error C2440: 'function': cannot convert from 'double' to 'PyObject *'   
        numpy\core\src\multiarray\scalartypes.c.src(3243): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
        numpy\core\src\multiarray\scalartypes.c.src(3242): error C2198: '_Py_HashDouble': too few arguments for call
        numpy\core\src\multiarray\scalartypes.c.src(3228): error C2440: 'function': cannot convert from 'npy_longdouble' to 'PyObject *'
        numpy\core\src\multiarray\scalartypes.c.src(3228): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
        numpy\core\src\multiarray\scalartypes.c.src(3228): error C2198: '_Py_HashDouble': too few arguments for call
        numpy\core\src\multiarray\scalartypes.c.src(3237): error C2440: 'function': cannot convert from 'npy_longdouble' to 'PyObject *'
        numpy\core\src\multiarray\scalartypes.c.src(3237): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
        numpy\core\src\multiarray\scalartypes.c.src(3236): error C2198: '_Py_HashDouble': too few arguments for call
        numpy\core\src\multiarray\scalartypes.c.src(3243): error C2440: 'function': cannot convert from 'npy_longdouble' to 'PyObject *'
        numpy\core\src\multiarray\scalartypes.c.src(3243): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
        numpy\core\src\multiarray\scalartypes.c.src(3242): error C2198: '_Py_HashDouble': too few arguments for call
        numpy\core\src\multiarray\scalartypes.c.src(3258): error C2440: 'function': cannot convert from 'double' to 'PyObject *'   
        numpy\core\src\multiarray\scalartypes.c.src(3258): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
        numpy\core\src\multiarray\scalartypes.c.src(3258): error C2198: '_Py_HashDouble': too few arguments for call
        numpy\core\src\multiarray\scalartypes.c.src(4478): warning C4244: 'return': conversion from 'npy_intp' to 'int', possible loss of data
        [end of output]

        note: This error originates from a subprocess, and is likely not a problem with pip.
        ERROR: Failed building wheel for numpy
        Running setup.py clean for numpy
        error: subprocess-exited-with-error

        python setup.py clean did not run successfully.
        exit code: 1

        [10 lines of output]
        Running from numpy source directory.

        `setup.py clean` is not supported, use one of the following instead:

          - `git clean -xdf` (cleans all files)
          - `git clean -Xdf` (cleans all versioned files, doesn't touch
                              files that aren't checked into the git repo)

        Add `--force` to your command to use it anyway if you must (unsupported).

        [end of output]

        note: This error originates from a subprocess, and is likely not a problem with pip.
        ERROR: Failed cleaning build dir for numpy
      Failed to build numpy
      ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects

      [notice] A new release of pip is available: 23.1.2 -> 23.2
      [notice] To update, run: python.exe -m pip install --upgrade pip
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 23.1.2 -> 23.2
[notice] To update, run: python.exe -m pip install --upgrade pip

Expected behavior A clear and concise description of what you expected to happen. It should run the script and

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.