OCR-D / ocrd_all

Master repository which includes most other OCR-D repositories as submodules
MIT License
72 stars 18 forks source link

Dockerfile: remove /build/core/.git directory to overwrite with submodule link #442

Closed kba closed 4 months ago

kba commented 4 months ago

Otherwise build fails with

#11 ERROR: cannot replace to directory /.../build/core/.git with file

I am not sure how we introduced this and this workaround is not pretty (rm -rf /build/core/.git) so if somebody has a better solution, I'd be happy to change it.

kba commented 4 months ago

CI failed as expected with

ocrd/core-cuda:v2.65.0 inconsistent with core version v2.65.0-1-gcc6ea575

which is good, but I update core in this PR as well to make sure it won't if all is set up properly.

kba commented 4 months ago

CI failed as expected with

ocrd/core-cuda:v2.65.0 inconsistent with core version v2.65.0-1-gcc6ea575

which is good, but I update core in this PR as well to make sure it won't if all is set up properly.

Ah, but of course there is a catch. The check won't work if we derive from a non-versioned Docker base image such as ocrd/core:minimum-cuda.

I still think the check is a good idea and will open an issue for it but comment it out for now because we really need a new release urgently.

kba commented 4 months ago

It looks like make docker-maximum-cuda does not just build that image:

> make -n docker-maximum-cuda NO_UPDATE=1
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/core-cuda:v2.66.1 \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_tesserocr ocrd_wrap workflow-configuration ocrd_olahd_client" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:minimum-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:minimum-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann dinglehopper docstruct format-converters nmalign ocrd_calamari ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_keraslm ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_in
consistencies ocrd_segment ocrd_tesserocr ocrd_wrap workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:medium-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:medium-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann core dinglehopper docstruct eynollah format-converters nmalign ocrd_anybaseocr ocrd_calamari ocrd_cis ocrd_detectron2 ocrd_doxa ocrd_fileformat ocrd_froc ocrd_im6convert ocrd_
keraslm ocrd_kraken ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_segment ocrd_tesserocr ocrd_wrap sbb_binarization workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:maximum-cuda .

Which leads to exceeding the 1h time limit...

kba commented 4 months ago

``

It looks like make docker-maximum-cuda does not just build that image:

> make -n docker-maximum-cuda NO_UPDATE=1
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/core-cuda:v2.66.1 \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_tesserocr ocrd_wrap workflow-configuration ocrd_olahd_client" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:minimum-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:minimum-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann dinglehopper docstruct format-converters nmalign ocrd_calamari ocrd_cis ocrd_fileformat ocrd_im6convert ocrd_keraslm ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_in
consistencies ocrd_segment ocrd_tesserocr ocrd_wrap workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:medium-cuda .
docker build \
--progress=plain \
--build-arg BASE_IMAGE=ocrd/all:medium-cuda \
--build-arg VCS_REF=$(git rev-parse --short HEAD) \
--build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--build-arg OCRD_MODULES="cor-asv-ann core dinglehopper docstruct eynollah format-converters nmalign ocrd_anybaseocr ocrd_calamari ocrd_cis ocrd_detectron2 ocrd_doxa ocrd_fileformat ocrd_froc ocrd_im6convert ocrd_
keraslm ocrd_kraken ocrd_olahd_client ocrd_olena ocrd_pagetopdf ocrd_repair_inconsistencies ocrd_segment ocrd_tesserocr ocrd_wrap sbb_binarization workflow-configuration" \
--build-arg PIP_OPTIONS="-e" \
--build-arg PARALLEL="" \
--build-arg PYTHON="python3" \
--network=host \
-t ocrd/all:maximum-cuda .

Which leads to exceeding the 1h time limit...

That is the correct behavior as of #436. Upping the resource_class from the implied medium to large fixes the build time. Merged into #441.

bertsky commented 4 months ago

Which leads to exceeding the 1h time limit...

That's odd. We used to be well below 1h with the new multi-stage build (core → minimum → medium → maximum), which we switched to recently in #436. In fact, it was just 47min. It now times out during layer export – perhaps in this case, merely the network side happened to be slow?

Or is resource_class: large somehow having an opposite effect here?

bertsky commented 4 months ago

Upping the resource_class from the implied medium to large fixes the build time. Merged into #441.

Ah, I did not notice this was the most recent change. Thanks!