compdemocracy / polis

:milky_way: Open Source AI for large scale open ended feedback
https://pol.is
GNU Affero General Public License v3.0
780 stars 183 forks source link

Prevent stale container build artifacts #1602

Open metasoarous opened 1 year ago

metasoarous commented 1 year ago

In the docker-compose.dev.yml overlay we use volumes mappings as a developer convenience so that local source files are mirrored in the running docker containers, enabling code reloading without having to rebuild or restart containers. In particular, server has the mapping .server:/app, and the math container has ./math:/app.

This is great, but can introduce subtle bugs when rebuilding images after either the base docker image updates (the os/linux version, system level deps, or the node/jvm version). In the case of node apps, because the node_modules directory is being mapped back and forth between the host system and the container directory via the volumes mapping described above, when rebuilding the server image, the old node_modules directory ends up making its way into the container, and stale build artifacts there can fail to be rebuilt. When these artifacts point to things in the old system (such as paths for native system level libraries, as might be required for pg-native, for example), these npm modules will fail to work at runtime because the native libraries (.so files and such) they're expecting to find won't be where it's expecting. A similar phenomena can occur with the math worker and the math/.cpcache (class path cache) directory.

Some approaches to fixing this:

I'm still fuzzy on the lifecycle implications of the first approach, but one advantage is that it would prevent the directories from being mapping onto the local system at all, which would allow individual components to be run directly on the host system, and not interfere with the artifacts built against the docker images.

This just bit me while working on #1600, since I kept getting error messages saying that node couldn't find libc-musl (Error: libc.musl-x86_64.so.1: cannot open shared object file: No such file or directory), and eventually figured out (after nuking my local node_modules directory) that it was still expecting to find this native library (required for pg-native) in the location one would expect to find it on alpine-linux :facepalm: I've encountered similar things in the past with both the server and math containers, but never quite identified that it was the volumes mappings biting us, given that these directories are in the .dockerignore files. Live and learn...

metasoarous commented 1 year ago

Just tried adding the mappings described in my first option above, and while it does work, it also still creates a empty directories for these files with root ownership and write permissions, meaning that if you try to run/install things locally, it fails unless you run with sudo (which obviously isn't tenable). If I create the directories first using mkdir (with my user as owner), the directories get left alone by docker-compose. However, I have to manually take that step first. It's possible we could go this route and add a step somewhere in the Makefile to create these directories before running anything, but it'll take some finessing to get it right.

metasoarous commented 1 year ago

Probably the thing that makes more sense is to sort out the permissions so that the user things get created under by docker has the same USER_ID as the host system's user. For now it may still be worth merging these changes, since it should still be an improvement on what is happening currently, though we may still need to add an explicit rm ... in the docker build process prior to installing dependencies to make sure that these volumes get cleared out when a reinstall happens. But I'm a little on the fence about this, because it will mean longer build times. So it may be more appropriate to just make sure that these volumes are being cleared when we do a FULL-REBUILD.

metasoarous commented 1 year ago

@ballPointPenguin and I just discussed this (it's bitten us both recently now), and we're generally in agreement that it's probably best to just always rm the node_modules (etc.) directories as part of the typical Dockerfile build process, right before npm install (etc.) is run. This does mean that every time you add a dependency in development (assuming using Docker for dev) that the entire set of node_modules or other dependencies will need to be rebuilt, and we'll ultimately have to see how much time this rebuild takes. If it's not a terrible amount of time, it is probably worth it. But it's also possible that for some of the native modules (the exact ones that are creating this problem in the first place), the rebuilds take quite a while, and that having to do this to update any dependency in dev will take more time than we're willing to accept, in which case we can revisit the issue and come up with another mechanism for clearing this state, and dev documentation for highlighting this potentiality and solution(s).