OCR-D / ocrd_calamari

Recognize text using Calamari OCR and the OCR-D framework
Apache License 2.0
13 stars 6 forks source link

Review tests #72

Open mikegerber opened 2 years ago

mikegerber commented 2 years ago
mikegerber commented 2 years ago
* [ ]  I also noticed an issue where I had an old checkout of `repo/assets` which would NOT trigger test fails but a fresh checkout would
* [ ]  `test/assets` are copied once and are then reused

@bertsky Reading https://github.com/OCR-D/ocrd_tesserocr/blob/master/Makefile#L124-L135, I think we should discuss how to handle {repo,test}/assets better?

bertsky commented 2 years ago

Reading https://github.com/OCR-D/ocrd_tesserocr/blob/master/Makefile#L124-L135, I think we should discuss how to handle {repo,test}/assets better?

Yes, I have always argued against that pattern – precisely because of old checkouts. The proper way to handle this is via subrepos, which can be updated. See for example repo/assets in core.

bertsky commented 2 years ago
  • test/assets are copied once and are then reused

Yes, but the dependency to repo/assets ensures that it is in sync with upstream. (The problem is the rule for the latter, see above.)

  • We reuse the temporary workspace which is also potentially a problem

That's not true. Each test case is independent from the others – the fixture makes a fresh copy of test/assets under /tmp.

mikegerber commented 2 years ago
  • We reuse the temporary workspace which is also potentially a problem That's not true. Each test case is independent from the others – the fixture makes a fresh copy of test/assets under /tmp.

Yeah, I didn't remember writing it correctly :) My fixture code leaves the temporary directory though, and I think I'm going to rewrite it to use mktemp and only leaving it only if a fail occurs or the user wants it.

mikegerber commented 2 years ago

Tests now run on Python 3.6 to 3.10 on CircleCI (https://github.com/OCR-D/ocrd_calamari/commit/ef63291b5ada6cce173d505c566876bb9dd7010f, https://app.circleci.com/pipelines/github/OCR-D/ocrd_calamari/193/workflows/e099d41f-8eb7-4f8c-b605-4143a79ba52e).

mikegerber commented 1 year ago

We now install an older opencv-python-headless using a wheel (pip install --prefer-binary opencv-headless) for Python 3.6. This avoids the major pain of having to compile OpenCV for this version and so cuts down test time from >30 minutes to <5 minutes. The price paid for this is that we don't test with the latest OpenCV, but as Python 3.6 is EOL, I can live with it.

bertsky commented 1 year ago

That's a nice recipe. Should adopt that in all the other places where we still use Python 3.6

mikegerber commented 1 year ago
* [ ]  Why is the CircleCI result not accessible/private?

@kba Do you have an idea how to resolve this? I checked the project settings but couldn't find anything. To reproduce open a private browser window (= not logged in to CircleCI) and try:

😡 https://app.circleci.com/pipelines/github/OCR-D/ocrd_calamari gives 404 ✔ https://app.circleci.com/pipelines/github/OCR-D/core - for comparison - gives the build results

mikegerber commented 1 year ago

That's a nice recipe. Should adopt that in all the other places where we still use Python 3.6

Keep me posted if you adopt this. I currently use this in CircleCI config but could image moving it to the Makefile if that's the convention:

      - run: if python3 --version | grep -q "Python 3.6"; then pip install --prefer-binary -U opencv-python-headless numpy; fi  # to avoid compilation

I believe there would also be line in requirements.txt that could do the same, but I didn't test this:

opencv-python-headless, opencv-python-headless == 4.6.0.66 ; python_version < "3.7"

But then I would really like if everyone would move away from Ubuntu 18.04, if possible.

mikegerber commented 1 year ago

That's a nice recipe. Should adopt that in all the other places where we still use Python 3.6

It gets pulled in by ocrd so affects every processor - just had the same issue with dinglehopper (resolved with the same workaround).

bertsky commented 1 year ago

Agreed, we should add this to core.

bertsky commented 1 year ago

But then I would really like if everyone would move away from Ubuntu 18.04, if possible.

we are not Ubuntu 20 / Python 3.7 now in core/ocrd_all Docker BTW. (There's been no notification unfortunately.)

kba commented 1 year ago

(There's been no notification unfortunately.

Notification in the chat coming once I get around to fixing the base image issue and do next release.

@kba Do you have an idea how to resolve this? I checked the project settings but couldn't find anything. To reproduce open a private browser window (= not logged in to CircleCI) and try:

😡 https://app.circleci.com/pipelines/github/OCR-D/ocrd_calamari gives 404 ✔ https://app.circleci.com/pipelines/github/OCR-D/core - for comparison - gives the build results

I've compared the project settings for both projects and could spot no difference. I'll try rotating the GitHub keys, perhaps there are stale permission settings linked to that.

mikegerber commented 1 year ago

Run scheduled tests (There have been subtle changes, e.g. in OCR-D, that changed the filenames of created files)

I set up a monthly schedule. (Needs to be done by "hand" unfortunately)

bertsky commented 1 year ago

pip install --prefer-binary -U opencv-python-headless numpy

I'm going to add a line

    if $(PYTHON) -V | fgrep -e 3.5 -e 3.6; then $(PIP) install --prefer-binary -U opencv-python-headless numpy; fi

to the recipe of make install in core – so that it will end up in all sub-venv, Docker and CI targets.

@kba Is it ok to add this to https://github.com/OCR-D/core/pull/986?

kba commented 1 year ago

@kba Is it ok to add this to OCR-D/core#986?

Yes, please. The opencv build time is enormous for fresh python 3.6- installations, great if we can reduce that.