elastic / eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
https://eland.readthedocs.io
Apache License 2.0
628 stars 98 forks source link

Reduce Docker image size from 4.8GB to 2.2GB #615

Closed pquentin closed 9 months ago

pquentin commented 9 months ago

Closes #600

This was done by using the torchcpu PyPI package and not storing the PyPI cache in the Docker image.

Tested using docker run -it --rm --network host -e ES_USERNAME=$ES_USERNAME -e ES_PASSWORD=$ES_PASSWORD eland eland_import_hub_model --cloud-id $CLOUD_ID --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english --task-type ner

pquentin commented 9 months ago

Thanks to @dolaru, I was able to adapt run.sh to build and push an ARM image. This command worked:

ENVIRONMENT=staging RELEASE_VERSION=8.9.0 with-vault-ci bash .buildkite/release-docker/run.sh

I would appreciate if an Elastic employee with an Apple Silicon Macbook could test the resulting image. (Ping me offline to get instructions.)

dolaru commented 9 months ago

Good stuff @pquentin! I can confirm that the ARM container image works as intended. 👍🏽

I successfully imported the elastic/distilbert-base-cased-finetuned-conll03-english model using eland_import_hub_model in that container image.