OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
117 stars 31 forks source link

CD: also deploy CUDA ML variants, and build multi-platform images #1239

Closed bertsky closed 1 week ago

bertsky commented 1 month ago

Useful if we want to starting configuring processor module Dockerfiles based on core-cuda-tf1, tf2 or torch instead of just core or core-cuda. (This in turn can be useful for ocrd_all with service containers, because most of the layers can be shared.)

bertsky commented 1 month ago

Last two commits: I am experimenting with getting builds for other architectures as well. However, further changes are required, because there are usually no OpenCV and lxml binaries available for those platforms.

For lxml it suffices to add libxml2-dev and libxslt-dev.

But for opencv-python-headless, on arm/v7 we need a newer version of cmake to even get scikit-build to compile (and we cannot use the PPA from savoury1 because it is misconfigured – there's no matching version of cmake-data in those repositories; however, other PPAs which do work are not available for ppc64le.

bertsky commented 1 month ago

So in b2ae951a628b2a1e24e7c257eba5591d6ef2ccc1, in order to get opencv-python-headless to compile via sdist on arm/v7, which due to cmake (as Python package) not being available as a prebuilt binary on that platform is necessary, I added --no-build-isolation as a workaround for https://github.com/scikit-build/cmake-python-distributions/issues/503. But that unfortunately breaks lots of other build variants (see CI), so I'll revert that part.

In the current latest on Dockerhub one can see a preview of the multi-platform build (still only for ocrd/core there).

bertsky commented 3 weeks ago

So in b2ae951, in order to get opencv-python-headless to compile via sdist on arm/v7, which due to cmake (as Python package) not being available as a prebuilt binary on that platform is necessary, I added --no-build-isolation as a workaround for scikit-build/cmake-python-distributions#503. But that unfortunately breaks lots of other build variants (see CI), so I'll revert that part.

Thus, in 67f77ae I disabled that platform for now. See here for why this is currently not fixable.

bertsky commented 2 weeks ago

@MehmedGIT can you please help? The CI fails for py310 ubuntu:22 only. The only context is

 Network ocrd_network_test  Creating
 Network ocrd_network_test  Created
 Container ocrd_network_mongo_db  Creating
 Container ocrd_network_rabbit_mq  Creating
 Container ocrd_network_mongo_db  Created
 Container ocrd_network_rabbit_mq  Created
 Container ocrd_network_processing_server  Creating
 Container ocrd_network_processing_server  Created
 Container core_test  Creating
 Container network-ocrd_dummy_processing_worker-1  Creating
 Container core_test  Created
 Container network-ocrd_dummy_processing_worker-1  Created
 Container ocrd_network_rabbit_mq  Starting
 Container ocrd_network_mongo_db  Starting
 Container ocrd_network_rabbit_mq  Started
 Container ocrd_network_mongo_db  Started
 Container ocrd_network_rabbit_mq  Waiting
 Container ocrd_network_mongo_db  Waiting
 Container ocrd_network_mongo_db  Error
 Container ocrd_network_rabbit_mq  Healthy
dependency failed to start: container ocrd_network_mongo_db exited (48)
make: *** [Makefile:293: network-integration-test-cicd] Error 1

Could this be (previously undetected) a timing glitch?

MehmedGIT commented 2 weeks ago

@bertsky, I face that issue from time to time as well. Usually, restarting the test helps. I think we could improve that by increasing the retries in the health check in the docker-compose file and make it 90 instead of 30:

healthcheck:
  test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
  interval: 1s
  timeout: 3s
  retries: 90

Sometimes starting the mongo db takes longer than the expected health check total timeout.

bertsky commented 1 week ago

So the last commit ensured that Github will show our pytest and flake8 results directly.

However, IMO we should modify our Makefile to make sure that PYTEST_ARGS is used in every pytest call. Otherwise my recipe…

make test benchmark PYTEST_ARGS=--junitxml=test-results/test.xml

…will not pick up its results.

bertsky commented 1 week ago

Also, does anybody know what's wrong with the Scrutinizer setup?

kba commented 1 week ago

Also, does anybody know what's wrong with the Scrutinizer setup?

AFAICS it hangs because apt is not called with -y, so it asks questions and times out waiting for an answer. Nothing that cannot be fixed later.