OCR-D / ocrd_all

Master repository which includes most other OCR-D repositories as submodules
MIT License
72 stars 17 forks source link

isolate pip with venv semaphores #287

Closed bertsky closed 2 years ago

bertsky commented 2 years ago

Pursuant to #286, try using the same mechanism we already use for git to bring about isolation for pip as well – but allow different venvs to be modified concurrently.

Also contains an attempt at speeding up build by replacing the costly re-pip trick (forcing reinstallation to ensure up-to-date timestamps) with just direct timestamp update (but for the targets only).

bertsky commented 2 years ago

Remaining CI failure seems to be only the stdout timeout for docker save. I have increased it on master.

bertsky commented 2 years ago

Remaining CI failure seems to be only the stdout timeout for docker save. I have increased it on master.

No idea why that update does not take effect in PRs immediately. (Anyway – should work, PR is ready to be merged AFAICS.)

bertsky commented 2 years ago

This does not update the modules themselves, so we need not run release.sh – can I merge this, @kba? (We are still waiting for ocrd/all/maximum-cuda to finish...)

bertsky commented 2 years ago

CI failure: bad news – new upstream inconsistencies …

numpy>=1.20.0 (from imageio==2.16.0->scikit-image==0.17.2->ocr4all-pixel-classifier==0.6.5->-r requirements.in
numpy~=1.19.2 (from tensorflow==2.6.2->ocr4all-pixel-classifier==0.6.5->-r requirements.in

This came from ocrd_pc_segmentation – I suggest we try to fix that upstream.

bertsky commented 2 years ago

This came from ocrd_pc_segmentation – I suggest we try to fix that upstream.

Strange. ocr4all-pixel-classifier requirements does not even allow TF 2.6 – it requries <=2.5. And no restriction on scikit-image. But somehow that ends up in the above combination. It has to do with the mysterious pip-compile mechanism.

@chreul @maxnth @chaddy314 could you please assist?

maxnth commented 2 years ago

@bertsky I just contacted the original author of the pixel-classifier and he's looking into fixing it upstream.

Edit: Just for the sake of clarification: the ocr4all-pixel-classifier isn't written or maintained by the OCR4all team.

crater2150 commented 2 years ago

The problem seems to be, that scikit-image depends on imageio>=2.4.1. Note the missing upper bound on the version. With python packages, the only option to have a build that will still work in the future seems to be to hardcode all dependencies including transitives to a fixed version :(

I've added imageio ~= 2.4.1 to ocr4all-pixel-classifier's dependencies as a workaround, and also re-added the full version-pinned requirements.txt. I also published a release 0.6.6 on PyPI.

bertsky commented 2 years ago

The problem seems to be, that scikit-image depends on imageio>=2.4.1. Note the missing upper bound on the version. With python packages, the only option to have a build that will still work in the future seems to be to hardcode all dependencies including transitives to a fixed version :(

Indeed. Which will likely cause further conflicts as more packages are needed in the same venv.

Could anything be done on the TF version front? What was the reason for TF<=2.5 in the first place? (I have recently seen that the HDF5 model format quickly becomes incompatible with other minor releases of Python, because the custom layer code is serialised via marshal format. The better alternative is TF's SavedModel format, which is basically a directory with Protobuf file and some resource files. Conversion is as simple as an interactive load+save. Would that help to make ocr4all-pixel-classifier models run on the newer versions?

I've added imageio ~= 2.4.1 to ocr4all-pixel-classifier's dependencies as a workaround, and also re-added the full version-pinned requirements.txt. I also published a release 0.6.6 on PyPI.

Thanks @crater2150 – that helped!

crater2150 commented 2 years ago

Could anything be done on the TF version front? What was the reason for TF<=2.5 in the first place?

I think, 2.5 was the most recent TF version, when that line was added, it simply hasn't been tested with newer versions.

I haven't had any problems with the HDF5 models between TF 2.0 and 2.5, so I'd say chances are good a newer TF version will just work. There isn't anyone actively developing the pixel classifier anymore, so larger changes to the codebase probably won't happen from our side.