OCR-D / ocrd_all

Master repository which includes most other OCR-D repositories as submodules
MIT License
72 stars 17 forks source link

[no ci] [RFC] Rework build process #295

Closed stweil closed 1 year ago

stweil commented 2 years ago

I currently try to rework the build process to support more modern Linux distributions and Python versions. Ideally the full test matrix of Ubuntu LTS versions and Python versions ranging from 3.6 to 3.10 should build fine.

My changes address several issues:

This PR is not for merging, but for discussion of the different aspects with the goal to find a consensus on the right solution.

stweil commented 2 years ago

I wonder whether we really need export PIP ?= pip3 and would prefer to replace $(PIP) by a simple pip. @bertsky, you added that macro in commit 7807ae609458b847c609c3a133467370034e3363. Are there use cases where it is helpful? It is also exported. Are there submodules which need it? I found core and cor-asv-ann which use PIP, but it looks like they also don't need it.

kba commented 2 years ago

I wonder whether we really need export PIP ?= pip3 and would prefer to replace $(PIP) by a simple pip

We don't need it any more, we did need it when we still had to support 2.7 under certain circumstances. In fact, using anything but pip/pip3 for $(PIP) could cause inconsistencies if people override it with a pip for a different minor version of python than the venv.

stweil commented 2 years ago

Now 3 of 5 Python versions complete the build: https://github.com/stweil/ocrd_all/actions/runs/1993530591.

The remaining failures occur when building ocrd_kraken (see pull request #288 which I included here) and might be fixed in the CI running now.

stweil commented 2 years ago

Now all builds pass.

stweil commented 2 years ago

The updated branch still passes for all builds: https://github.com/stweil/ocrd_all/actions/runs/2027885009.

stweil commented 2 years ago

Latest run passes all builds: https://github.com/stweil/ocrd_all/actions/runs/2029562843.

stweil commented 2 years ago

@bertsky, @kba, I compared the build rules here with those from git master and wrote the complete make all results in two separate virtual environments. The result shows that my build rules reduce the disk usage significantly, especially by reducing the total number of required virtual environments from four to two (one main venv, one sub venv). That would also result in faster builds and smaller docker images and so fix the time problems in CircleCI where the image uploads contribute a significant part.

stweil@ocr-02:~/src/github/OCR-D$ du -msc venv3.7-20220331-*
11605   venv3.7-20220331-master
6876    venv3.7-20220331-tensor
18480   total
stweil@ocr-02:~/src/github/OCR-D$ du -msc venv3.7-20220331-*/*
74  venv3.7-20220331-master/bin
594 venv3.7-20220331-master/build
2   venv3.7-20220331-master/build.log
11  venv3.7-20220331-master/include
2985    venv3.7-20220331-master/lib
1   venv3.7-20220331-master/lib64
3   venv3.7-20220331-master/libexec
1   venv3.7-20220331-master/locale
1   venv3.7-20220331-master/pyvenv.cfg
1   venv3.7-20220331-master/requirements.txt
103 venv3.7-20220331-master/share
7837    venv3.7-20220331-master/sub-venv
74  venv3.7-20220331-tensor/bin
594 venv3.7-20220331-tensor/build
1   venv3.7-20220331-tensor/build.log
11  venv3.7-20220331-tensor/include
4006    venv3.7-20220331-tensor/lib
1   venv3.7-20220331-tensor/lib64
3   venv3.7-20220331-tensor/libexec
1   venv3.7-20220331-tensor/locale
1   venv3.7-20220331-tensor/pyvenv.cfg
1   venv3.7-20220331-tensor/requirements.txt
103 venv3.7-20220331-tensor/share
2087    venv3.7-20220331-tensor/sub-venv
18480   total
stweil commented 2 years ago

Now I also have timing and size results for a local make docker-maximum-cuda-git (single-threaded):

OCR-D:master
0.60user 0.55system 56:52.93elapsed 0%CPU (0avgtext+0avgdata 64456maxresident)k
3848inputs+3024outputs (39major+10597minor)pagefaults 0swaps
ocrd/all                  maximum-cuda-git   d737c47c1285   2 hours ago     32.8GB

stweil:tensorflow
0.71user 0.58system 45:54.85elapsed 0%CPU (0avgtext+0avgdata 67572maxresident)k
2280inputs+2712outputs (5major+11208minor)pagefaults 0swaps
ocrd/all                  maximum-cuda-git   a8fc92d9f811   2 minutes ago   23.7GB
stweil commented 1 year ago

This proof of concept is no longer up to date, so it can be closed.