OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
39 stars 11 forks source link

pip install -U pip before pip install #138

Closed kba closed 4 years ago

kba commented 4 years ago

Let's see whether that fixes the issue. I guess opencv-headless required skbuild but dropped that requirement recently, but haven't checked yet.

kba commented 4 years ago

Definitely an issue with opencv. Adding scikit-build fixed that particular issue but now throws this:

Installing collected packages: pyrsistent, jsonschema, click, lxml, numpy, Pillow, ocrd-utils, ocrd-models, ocrd-modelfactory, bagit, shapely, urllib3, certifi, chardet, requests, bagit-profile, pyyaml, ocrd-validators, atomicwrites, MarkupSafe, Jinja2, itsdangerous, Werkzeug, Flask, wrapt, Deprecated, opencv-python-headless, ocrd, tesserocr
  Running setup.py install for opencv-python-headless ... - error
    Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-fj9bt_9w/opencv-python-headless/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-uwpfxbcp-record/install-record.txt --single-version-externally-managed --compile:
    /usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
      warnings.warn(msg)
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/skbuild/setuptools_wrap.py", line 560, in setup
        cmkr = cmaker.CMaker(cmake_executable)
      File "/usr/local/lib/python3.6/dist-packages/skbuild/cmaker.py", line 95, in __init__
        self.cmake_version = get_cmake_version(self.cmake_executable)
      File "/usr/local/lib/python3.6/dist-packages/skbuild/cmaker.py", line 82, in get_cmake_version
        "Problem with the CMake installation, aborting build. CMake executable is %s" % cmake_executable)

    Problem with the CMake installation, aborting build. CMake executable is cmake

    ----------------------------------------

Probably should bisect recent opencv-headless versions to find the culprit.

kba commented 4 years ago

:fire: eek, this is now happening for core as well (but only py35 and py36). https://app.circleci.com/pipelines/github/OCR-D/core/625/workflows/cae742df-a08b-4a16-a569-58becbb38c55/jobs/1607

bertsky commented 4 years ago

Let's see whether that fixes the issue. I guess opencv-headless required skbuild but dropped that requirement recently, but haven't checked yet.

Can confirm that. opencv-python's setup.py used to install its scikit-build on the fly (by calling pip cmdline). Now it outright imports it on the module level, cf here.

Don't know how that is supposed to work, but: apparently we are not required to install this ourselves. Instead, this issue says our pip is simply too old.

/usr/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'

To me that's rather indicative of an outdated setuptools or pip.

Which reminds me that nowhere in our CI setup we update them (we only fetch the most recent system packages).

So I suggest rolling back the scikit-build requirement and instead introduce an pip install -U pip step in deps.

bertsky commented 4 years ago

ek, this is now happening for core as well (but only py35 and py36).

Same diagnosis IMO: In core we do have a pip install -U pip step, but as part of install's recipe, not for deps. It must be the first thing we do in the whole chain.

codecov[bot] commented 4 years ago

Codecov Report

Merging #138 into master will increase coverage by 0.09%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #138      +/-   ##
==========================================
+ Coverage   36.26%   36.36%   +0.09%     
==========================================
  Files           9        9              
  Lines        1012     1012              
  Branches      224      224              
==========================================
+ Hits          367      368       +1     
  Misses        575      575              
+ Partials       70       69       -1     
Impacted Files Coverage Δ
ocrd_tesserocr/segment_region.py 53.33% <0.00%> (+0.66%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1159fbf...1160946. Read the comment docs.

kba commented 4 years ago

So I suggest rolling back the scikit-build requirement and instead introduce an pip install -U pip step in deps.

That fixed it :) Thanks! https://app.circleci.com/pipelines/github/OCR-D/ocrd_tesserocr/255/workflows/6604b13c-774e-4cf7-98df-26c8f937de41/jobs/248

skvark commented 4 years ago

Can confirm that. opencv-python's setup.py used to install its scikit-build on the fly (by calling pip cmdline). Now it outright imports it on the module level, cf here.

Don't know how that is supposed to work, but: apparently we are not required to install this ourselves. Instead, this issue says our pip is simply too old.

Hello, noticed this issue via linked issues. For clarity, scikit-build gets installed via pyproject.toml which is the new way of defining for example build systems for Python packages. See: https://snarky.ca/what-the-heck-is-pyproject-toml/

You were probably using the old manylinux1 pre-built wheels earlier. There wasn't source distribution in PyPI then so everything worked fine. After manylinux2014 was introduced along with the source distributions, old version of pip bailed out on the pre-built wheels because it doesn't understand that new standard. Then it tried to compile from source with the new source distribution. However, that failed also due to too old pip version because it didn't understand the build system definition in pyproject.toml.

kba commented 4 years ago

Thanks @skvark for the explanation, much appreciated. The skbuild problem forced us to update pip which is a good thing. Also good to know about pyproject.toml, we will discuss whether it might be worth switching.