Optimize the base Docker image used for hxltm-action

fititnt / hxltm-action

[non-production-ready] Multilingual Terminology in Humanitarian Language Exchange. TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)

https://hxltm.etica.ai/

The Unlicense

3 stars 2 forks source link

Optimize the base Docker image used for hxltm-action #1

Closed fititnt closed 2 years ago

fititnt commented 2 years ago

The current Dockerfile is using FROM python:3.9-bullseye (Debian, bigger base image), but the libhxl-python can run on Alpine. (see https://github.com/HXLStandard/hxl-proxy/blob/main/Dockerfile).

While is possible to manually copy the hxltmcli.py (and hxltmdexml.py), the point here is refactor the hdp-toolchain to remove extra dependencies on https://github.com/EticaAI/HXL-Data-Science-file-formats/blob/main/requirements.txt.

# (...)
#
# When working with urnresolver. Not used by others
cryptography
keyring

I'm almost sure that was the cryptography that make the alpine fail, so both dependencies on https://github.com/EticaAI/HXL-Data-Science-file-formats must be optional AND documentation should be updated. Then we could optimize the Docker base image here.

This point actually don't block functionality, but allow speed up a bit more.

fititnt commented 2 years ago

The Dockerfile now uses FROM python:3.9-alpine instead of FROM python:3.9-bullseye, which is an much smaller base image, and should help a lot on initial download when images are not cached.

Trivia: runners of GitHub Actions actually seems to run as fast with large Debian images as would be with Alpine

Captura de tela de 2021-11-07 06-47-37

Actually the new image run 1 second slower, so not sure if this is because of first run (so no previous image cached) or GitHub is so optimized for Debian based images (the documentation recommends use Alpine, then use Debian) that if really was no option to use Alpine, Debian would still be okay.

fititnt commented 2 years ago

Ok. GitHub Actions does not support more input formats like array or objects. The more complex is string. So we literaly have to convert back whatever the user wrote on the YAML as some format.

Since we're optimized to use Alpine, this also means we're using near POSIX Shell called Ash, with have less features than Bash (which already is less feature-complete than most programming languages.

So... the entrypoint.sh will get a bit verbose. But this type of code is likely to be compatible with like forever as we're literally using standards from 20 to 40 years old.

fititnt commented 2 years ago

We will close this issue. The hxltm-action will be forever Alpine.

the idea of try new GitHub actions for different proposes using the same repository is a bad idea. Less stable (even if @main is explicitly not recommended; but we may use for other test projects) and bigger images.

On the #6 is the place for new cli tools not related at all with HXL.