OCR-D / ocrd_all

Master repository which includes most other OCR-D repositories as submodules
MIT License
71 stars 18 forks source link

CI/Docker build: faster or parallel #286

Open bertsky opened 2 years ago

bertsky commented 2 years ago

In maximum-cuda, we recently broke the 1h build time boundary for CircleCI free accounts. Hence the attempt to use make -j again to reduce it (the VMs are multi-core, so this should work in principle).

But it seems that we have a race condition which has not been detected yet:

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/sub-venv/headless-tf2/bin/pip' -> '/tmp/pip-0dn813kj-uninstall/usr/local/sub-venv/headless-tf2/bin/pip'

This looks like multiple pip calls on the same venv clash. Maybe we need to synchronize these calls in the same way we do for git calls.

bertsky commented 2 years ago

I'm not sure #287 actually does what we want. This looks like we still have races:

sem --will-cite --fg --id ocrd_all_pip/usr/local/sub-venv/headless-tf1 python3 -m venv /usr/local/sub-venv/headless-tf1
sem --will-cite --fg --id ocrd_all_pip/usr/local/sub-venv/headless-tf1 python3 -m venv /usr/local/sub-venv/headless-tf1
The virtual environment was not created successfully because ensurepip is not
available.  On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

    apt-get install python3-venv

You may need to use sudo with that command.  After installing the python3-venv
package, recreate your virtual environment.

Failing command: ['/usr/local/sub-venv/headless-tf1/bin/python3', '-Im', 'ensurepip', '--upgrade', '--default-pip']

Makefile:152: recipe for target '/usr/local/sub-venv/headless-tf1/bin/activate' failed
make[1]: *** [/usr/local/sub-venv/headless-tf1/bin/activate] Error 1
make[1]: Leaving directory '/build'

Or in another variant:

Successfully built ocrd-cor-asv-ann gast termcolor
Installing collected packages: numpy, protobuf, markdown, h5py, grpcio, absl-py, termcolor, tensorflow-estimator, tensorboard, scipy, python-dateutil, pyparsing, opt-einsum, kiwisolver, keras-preprocessing, keras-applications, google-pasta, gast, cycler, astor, tensorflow-gpu, matplotlib, keras, editdistance, ocrd-cor-asv-ann
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/usr/local/sub-venv/headless-tf1/bin/f2py'

Makefile:241: recipe for target '/usr/local/sub-venv/headless-tf1/bin/ocrd-cor-asv-ann-evaluate' failed
make[1]: Leaving directory '/build'
make[1]: *** [/usr/local/sub-venv/headless-tf1/bin/ocrd-cor-asv-ann-evaluate] Error 1
Makefile:236: recipe for target '/usr/bin/ocrd-cor-asv-ann-evaluate' failed
make: *** [/usr/bin/ocrd-cor-asv-ann-evaluate] Error 2

Or yet another:

Successfully installed ocrd-validators-2.30.0
Obtaining file:///build/core/ocrd
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/sub-venv/headless-tf1/bin/python3 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/build/core/ocrd/setup.py'"'"'; __file__='"'"'/build/core/ocrd/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-7cvua7u9
       cwd: /build/core/ocrd/
  Complete output (5 lines):
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/build/core/ocrd/setup.py", line 3, in <module>
      from ocrd_utils import VERSION
  ModuleNotFoundError: No module named 'ocrd_utils'
  ----------------------------------------
WARNING: Discarding file:///build/core/ocrd. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Makefile:72: recipe for target 'install' failed
make[2]: Leaving directory '/build/core'
make[2]: *** [install] Error 1
Makefile:170: recipe for target '/usr/local/sub-venv/headless-tf1/bin/ocrd' failed
make[1]: *** [/usr/local/sub-venv/headless-tf1/bin/ocrd] Error 2
make[1]: Leaving directory '/build'
bertsky commented 2 years ago

The above was from three different variants of our CI. I am also seeing strange effects in native installations – if I enable -j, not all modules will be built.

Until we know what's going on, we should perhaps disable parallel build (and hope we are still under 1h due to #287 speedups).