HazyResearch / pdftotree

:evergreen_tree: A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
MIT License
428 stars 90 forks source link

Changes to fix deprecated sklearn dependency. #125

Open HodeiG opened 1 year ago

HodeiG commented 1 year ago

Description of the problems or issues

Is your pull request related to a problem? Please describe. During pdftotree installation I get this error:

[16:00:54 /tmp]$ python3.11 -m venv .venv
[16:01:03 /tmp]$ source .venv/bin/activate
(venv)|[16:01:08 /tmp]$ pip install pdftotree
Collecting pdftotree
  Using cached pdftotree-0.5.0-py3-none-any.whl (56 kB)
Collecting IPython
  Using cached ipython-8.14.0-py3-none-any.whl (798 kB)
Collecting beautifulsoup4
  Using cached beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
Collecting keras>=2.4.0
  Using cached keras-2.12.0-py2.py3-none-any.whl (1.7 MB)
Collecting numpy
  Using cached numpy-1.25.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
Collecting pandas
  Using cached pandas-2.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)
Collecting pdfminer.six>=20191020
  Using cached pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
Collecting pillow
  Using cached Pillow-9.5.0-cp311-cp311-manylinux_2_28_x86_64.whl (3.4 MB)
Collecting selectivesearch
  Using cached selectivesearch-0.4-py3-none-any.whl
Collecting sklearn
  Using cached sklearn-0.0.post5.tar.gz (3.7 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.

      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error

      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package

      If the previous advice does not cover your use case, feel free to report it at
      https://github.com/scikit-learn/sklearn-pypi-package/issues/new
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Does your pull request fix any issue. As per the sklearn documentation https://pypi.org/project/sklearn/:

Description of the proposed changes

Update setup.py

Test plan

No test plan

Checklist