Check models compatibility in Docker rebuild GH Actions workflow

juhoinkinen commented 1 year ago

The recently added GH Actions workflow for rebuilding Docker images (#715) could also verify that the models trained on the previous image build work (results-wise) identically in the new image. It is quite undesirable that the models would work even slightly differently in different Docker image builds of the same Annif version.

These are the steps in the workflow that aim to verify the compatibility and results identicality of models:

Train models with all (trainable) algorithms using old image (the one in quay.io with the tag being rebuild)
Evaluate the models and store results in eval.prev.out file using old image
Evaluate the models and store results in eval.out file using new image
Compare eval.prev.out and eval.out with diff, and fail the workflow in the case of difference, unless the box for allowing this is checked when triggering the workflow

For both training and evaluation the tests/corpora/archaeology/fulltext/ corpus is used, which is fine for all algorithms I think. Although there could be some dedicated corpora for this.

Also, there could be a similar workflow for checking models compatibility when preparing an Annif release, instead of doing compatibility checks manually, so the compatibility-check steps could be moved to a separate action for reusability, like the prepare action of CI/CD workflow.

Note: I've been working on this in my own fork, to avoid accidental image pushes to quay.io.

TODO before merge:

[ ] Switch the image used to compare current build to quay.io/natlibfi/annif from jinkinen/annif

codecov[bot] commented 1 year ago

Codecov Report

Patch and project coverage have no change.

Comparison is base (320af2b) 99.67% compared to head (07f0af7) 99.67%.

:exclamation: Current head 07f0af7 differs from pull request most recent head 569b367. Consider uploading reports for the commit 569b367 to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #719 +/- ## ======================================= Coverage 99.67% 99.67% ======================================= Files 89 89 Lines 6380 6380 ======================================= Hits 6359 6359 Misses 21 21 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

juhoinkinen commented 3 months ago

When #762 is merged, the upload/download functionality could be utilized for the models compatibility check. By downloading models (maybe from GitHub Actions cache?) to check this first step could be omitted:

Train models with all (trainable) algorithms using old image (the one in quay.io with the tag being rebuild)

NatLibFi / Annif

Check models compatibility in Docker rebuild GH Actions workflow #719

Codecov Report