Open t-lo opened 1 year ago
Hi @t-lo :wave: currently we don't publish images with models included (although as you've found out we have a process to build images with them).
We'd welcome a PR that publishes images with models included to ghcr.
@t-lo we faced the same issue. The best solution we found is to pre-build our own image like this:
FROM libretranslate/libretranslate:v1.3.8 AS models_cache
ARG with_models=true
USER libretranslate
WORKDIR /app
RUN if [ "$with_models" = "true" ]; then \
# initialize the language models
if [ ! -z "$models" ]; then \
./venv/bin/python install_models.py --load_only_lang_codes "$models"; \
else \
./venv/bin/python install_models.py; \
fi \
fi
RUN ./venv/bin/pip install . && ./venv/bin/pip cache purge
FROM models_cache AS final
RUN rm -rf /tmp/prometheus_data && mkdir -p /tmp/prometheus_data
ENV PROMETHEUS_MULTIPROC_DIR=/tmp/prometheus_data
ENTRYPOINT [ "./venv/bin/libretranslate"]
@skamenetskiy I've been using a tailored github actions workflow to do the same and to publish to ghcr (see issue description). Also, I've started dabbling a bit on a generalised github actions workflow for an upcoming PR to the LibreOffice repo. While my work is functional, it fails every now and then because the github action runners run out of disk space :sweat_smile: so I'm currently investigating options to use less storage space.
Hi, Not entirely related to container images containing the models, but if it can help anyone:
The way I setup LibreTranslate is with 2 volumes, one for argos-translate's cache and the other one for packages. I have not been running it for a long time so maybe something will break on the next update. Note: I'm running it in Kubernetes (k0s).
By using persistent volumes the app does not have to download models every time. I guess it fetches the index, sees that nothing has changed and continues.
Here are the important pieces of the Deployment:
apiVersion: apps/v1
kind: Deployment
...
spec:
template:
spec:
...
securityContext:
fsGroup: 1032
containers:
- name: libretranslate
...
volumeMounts:
- mountPath: "/home/libretranslate/.local/share/argos-translate/packages"
name: packages
- mountPath: "/home/libretranslate/.local/cache/argos-translate"
name: cache
Because the app runs with user privileges, I had to adjust the fsGroup. The value 1032 comes from the Dockerfile. Note: as a StorageClass I chose CephFS.
Would mounting /home/libretranslate/.local
. which has 5.8GB under cache
and 6.5GB under share
be a viable option? One time download, and future restarts would be fine. Not sure what happens when you start with a new version. Is this path updated periodically by libretranslate?
I updated Argos Translate to delete the model packages after they've been installed to help clean up the cache.
https://github.com/argosopentech/argos-translate/commit/377b7b42317fb917aaf1c2992437b8a3d4fb516a
This should help to reduce the size of the Docker image.
Hi, Not entirely related to container images containing the models, but if it can help anyone:
The way I setup LibreTranslate is with 2 volumes, one for argos-translate's cache and the other one for packages. I have not been running it for a long time so maybe something will break on the next update. Note: I'm running it in Kubernetes (k0s).
By using persistent volumes the app does not have to download models every time. I guess it fetches the index, sees that nothing has changed and continues.
Here are the important pieces of the Deployment:
apiVersion: apps/v1 kind: Deployment ... spec: template: spec: ... securityContext: fsGroup: 1032 containers: - name: libretranslate ... volumeMounts: - mountPath: "/home/libretranslate/.local/share/argos-translate/packages" name: packages - mountPath: "/home/libretranslate/.local/cache/argos-translate" name: cache
Because the app runs with user privileges, I had to adjust the fsGroup. The value 1032 comes from the Dockerfile. Note: as a StorageClass I chose CephFS.
This is exactly what we need within Docker Compose: a download and installation of those packages upon starting the container if they do not exist in the specified volumes, instead of during the building process. This approach saves a significant amount of space and makes the Docker image more scalable.
I've managed to setup this docker compose, not ideal but it does the job without the need of building the image yourself.
Change load_only_lang_codes
within the entrypoint to add languages or remove the argument to install all languages.
I haven't touched python a lot, there might be some errors.
version: '3'
services:
libretranslate:
image: libretranslate/libretranslate:latest-cuda
container_name: libretranslate
working_dir: /libretranslate
restart: unless-stopped
ports:
- "5000:5000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
- ./packages:/home/libretranslate/.local/share/argos-translate/packages:rw
- ./cache:/home/libretranslate/.local/cache/argos-translate:rw
entrypoint: bash -c 'python3 ../app/scripts/install_models.py --load_only_lang_codes "nl,en" && exec libretranslate --host 0.0.0.0'
healthcheck:
test: ['CMD-SHELL', 'python3 ../app/scripts/healthcheck.py']
This has been requested before - is there a container image available somewhere with language models included?
I've been reading through https://github.com/LibreTranslate/LibreTranslate/issues/155 but failed to find actual images (that include models) on dockerhub and ghcr. On ghcr in particular there seem to be no images available at all?
Benefits of a docker image with models included
Drawbacks
I've set up a build workflow for my own purposes (I operate fromm.social, a mastodon micro-instance) and am currently publishing a container with models included on ghcr: https://github.com/users/t-lo/packages/container/package/libretranslate . It's fed by a github actions workflow (https://github.com/t-lo/LibreTranslate-ghcr-publisher) which I currently trigger manually.
I'd be willing to add this to the existing LibreTranslate github actions workflow with a PR, if there's an appetite for that. Kindly let me know how to proceed, and if you're fine with using ghcr to publish images.