Ribbit-Network / ribbit-network-frog-hardware

The sensor for the world's largest crowdsourced network of open-source, low-cost, GHG Gas Detection Sensors.
https://www.ribbitnetwork.org/
MIT License
94 stars 26 forks source link

Investigate reducing co2 and gpsd image sizes #80

Closed djgood closed 1 year ago

djgood commented 2 years ago

See if we can reduce the sizes of the images to speed up updates, which allows a new device to come online faster and makes developing for our devices less of a waiting game! Reduces the waste of bandwidth as well ;)

An easy way that I've found to reduce image sizes is to use the multi-stage builds feature in Docker. We should be able to add build step which starts a fresh container, then copy all of the compiled/downloaded modules from the previous step, taking just what we need to run.

The simplest image that we could create would just contain the code and it's dependencies, but it may be worth keeping some tools around that are helpful debugging (e.g. i2cdetect).

abesto commented 2 years ago

I did an experiment to see how much we can save.

And the numbers:

Phase 2 is (at least in my buildx x-plat build environment) ridiculously slow, so definitely not worth it for the extra 10MB saving. Hooowever, phase 1 seems like it's worth doing? It's a 12% size decrease. Not as dramatic as you'd see with a static binary running on Alpine or whatever, but we are talking about Python. To get more, we'd probably need to start trimming the actual dependencies.

(Note, we should probably have done PYTHONDONTWRITEBYTECODE=1 on phase 1, should be zero overhead and save some space on .pyc files)

abesto commented 2 years ago

... A quick look through the output of the install_packages we start with shows stuff like fonts and libgtk2 being installed, so I'm guessing there's some further space to be saved by tracking down packages that think we need a desktop environment, and convincing them of the error of their ways.

maggie44 commented 2 years ago

Instead of copying all the content you could copy in just the file needed for the installs (pyproject.toml and the lock file?). That way when you make changes to co2.py it won't bust your cache, it will only rebuild when it needs new packages in the build step:

# This will copy all files in our root to the working  directory in the container
# It's almost guaranteed to bust the Docker build cache, so do it as late as possible
COPY . ./

https://github.com/abesto/ribbit-network-frog-sensor/blob/e456650f5fdb9ac3ec98a4276b9f303c34541424/software/co2/Dockerfile.template#L48-L50

This step isn't really going to give any advantage, the build steps are disposable packages that never makes it to the devices. Due to the layering in Docker and build caching it also won't really save space on your local machine. At the moment it will just slow the build down ever so slightly while you wait for the uninstall process to complete.

# Save a tiny amount of space: we don't need these at runtime
RUN pip uninstall -y wheel pip

https://github.com/abesto/ribbit-network-frog-sensor/blob/e456650f5fdb9ac3ec98a4276b9f303c34541424/software/co2/Dockerfile.template#L55-L56

Using the balena :buster-build images could also help build time as less installs will be required in the build step. It will make the build step image bigger, but as that never lands on a device there is little reason to worry.

Would also consider locking the Python version. I think it is something like balenalib/%%BALENA_MACHINE_NAME%%-python:3.12-buster-build. Otherwise a new Python update from 3.10 to 3.11 for example will be pushed out to the devices when using just python:buster-build. There is usually little need to be on the edge like that, can be done intermittently as and when needed. A Python update inside the container will require a lot of bandwidth to do across devices. Not to mention its helpful for ensuring breaking changes don't slip in.

Come to think of it, would say the same for the pip --upgrade in there too. It is updating without version control. Better to bump the image version with pip inside than always running on the latest pip, just to create a predictable environment.

abesto commented 2 years ago

Instead of copying all the content you could copy in just the file needed for the installs

Accurate; check out https://github.com/Ribbit-Network/ribbit-network-frog-sensor/blob/main/software/co2/Dockerfile.template, we're actually doing this!

This step isn't really going to give any advantage, the build steps are disposable packages that never makes it to the devices.

We install wheel and pip after setting up the venv, and we ship the venv to the runtime image, so surely removing them saves the space that pip and wheel take up?

Using the balena :buster-build images could also help build time

Yep, and we're also doing that! :D

Would also consider locking the Python version. Come to think of it, would say the same for the pip --upgrade in there too.

pip I'm conflicted about, but the Python version, for sure.

maggie44 commented 2 years ago

Whoops, I was looking at the experiment repos you linked, looks like there is another one merged with a lot of the things already in it.

We install wheel and pip after setting up the venv, and we ship the venv to the runtime image, so surely removing them saves the space that pip and wheel take up?

Ah I see. Nice catch.

I tend to install with —user instead of a virtual env which would use the global wheel package. But they are all much of a muchness. https://pythonspeed.com/articles/multi-stage-docker-python/

keenanjohnson commented 1 year ago

We're moving to an esp32-based frog for the foreseeable future, so closing this won't fix fow now. Perhaps once the global supply chain clears up a bit, we will revisit this.

New software repo below:

https://github.com/Ribbit-Network/ribbit-network-frog-software