New Just rewrite - Githubissues

andyneff commented 2 years ago

It's time add brand new 5 year old docker technology to new_just, buildkit.

[x] Replace docker-compose build with docker buildx bake
[x] Add default CI script (for gitlab)
- [x] Build in dind (just do this by default. If someone doesn't want to use dind, they can just remove the dind service)
- [x] Add support for custom CA certs for container registry
- [x] Add support for cache-to and cache-from
- [x] Add multistage caching support
- [x] Make sure ci builds docker image and runs in it with at least one mock test
[x] Update dockerfile to new # syntax
[ ] Move /venv/cache to a local linked dir to speed up sync time for devs
[ ] Update version requirements documentation for docker daemon
[ ] #437
[ ] #407

andyneff commented 1 year ago

Expansion on the /venv changes:

The pipenv cache is not needed in the final stage, as pipenv install should be happening in the pipenv stage The only caching that may happen in the final stages is pip cache at run time, and there's no need to precopy cache over for that case.

We can stop COPYing the venv dir to the final stage if:

After just build (well, at the end of it) we run alpine and rsync (with --chown) the /venv folder from the pipenv_cache stage to a volume
1. We will mount that volume in as /venv during just run
In the pipenv stage:
1. Use /cache for caching. Probably make it its own volume too for the pipenv stage only. No need to purge this ever really. (Advanced idea, make this computer global instead of project specific. Not sure how I feel about that yet)
2. This will need to be added to the just_entrypoint dir list to get chowned properly
3. I don't think a RUN --mount type=cache will work here, because there would be no way to reuse this at pipenv lock time
In the final stage:
1. The cache dir can remain in /venv/cache, but would be empty. This will assist in pip command only
2. Some way to prevent a dev from trying to lock in the final stage would be nice. Right now the just file redirects them to the correct stage iff they run just pipenv lock, but not if just shell and then lock. No ideas on this currently (making the lock file read only sounds like a pain, and offers no explanation. I think it's best to just hope for the best here)
The deploy stage:
1. The Deploy stage will copy the venv dir over like we currently do, but not the cache, as that is not needed in the deploy scenario
2. The cache dir can remain in /venv/cache just in case
Now that the volumes are no longer being populated by the docker "empty volumes copy contents from image" trick, the build target will need to copy over the contents after compose build. This would be an small image like alpine image that rsyncs the contents over. This will hopefully be just as slow/fast the first time, and much faster on subsequent runs when the content of the virtualenv does not change

andyneff commented 1 year ago

Add default gpu flags, maybe a GPU question

# NVIDIA GPU devices and capabilities
: ${${JUST_PROJECT_PREFIX}_NVIDIA_VISIBLE_DEVICES="all"}
: ${${JUST_PROJECT_PREFIX}_NVIDIA_DRIVER_CAPABILITIES="gpu"}

andyneff commented 1 year ago

Abandoned Pip venv idea:

What if we never built the venv in the docker image at all? And just built it like we do when we just compile source code?

The only way I could see this working is to add this as part of just sync. Which led us to the following problems:

If we just started anew every sync, it would be very slow every time we ran just sync, because all the packages would have to be downloaded by sync and any python packages that need compiling, would have to be compiled again, even if nothing changed in the requirements files.
If we maintain a pip cache, we could save on some of that time, however:
1. There would still be chunk of time wasted on rebuilding the venv every time we sync.
2. ❗⚠️❗Currently, if an image builds, it contains all the packages we need. So even if a wheel we download from a third party site outside of pypi takes down its wheel, we are not stuck in our tracks: the wheel is in the image. While, the wheel is (probably) buried in the http pip cache, digging that out is difficult. And there is a high chance that someone who is debugging a build problem will clear their volumes, and along with it, the wheel. On the other hand, they would not normally delete the old tagged version of the image from the last time it built successfully and run docker image prune in the process.
3. ❗⚠️❗Can introduce "It worked for me" because a package that is compiled into a wheel might be cached into the pip wheel cache: E.g:
  1. If I change a evironment variable, but not the version number on a package, it will automatically pick the cached wheel and not recompile. This wouldn't happen in docker build because the cache always starts fresh
  2. If we are compiling our own packages from a submodule, we won't typically update the version numbers for every change, leading to possible cache false hits.

Bullet iii does affect the original plan to use RUN --mount type=cache mentioned in the previous post. Mainly, we should be more selective on what part of the pip cache we docker build/cache, although the http download cache should be safe to cache

Over all, it seems like continuing to use docker build to make our venvs seems like out best bet.

decrispell commented 1 year ago

What would be the benefit(s) of not building the venv in the docker image? If we wanted to avoid the "fake_package" ugliness, could we somehow break the requirements.in (in the case of pip-tools) into two parts, one containing the "normal" dependencies that can be installed during the docker image build, and another containing just the editable dependencies that get added later once the source code volume is available?

I'm not sure if that's possible or not - if not maybe some other method of breaking up the venv dependencies into "normal" and editable dependencies is possible?

andyneff commented 1 year ago

We shouldn't do that, because we loose the ability to track the dependencies of our editables, which is something we really want.

I would have to test this, but one possibility would be to have a script that:

Grep the requirements.txt file, and remove all editables, and sync against that. This is pretty simple and should work for nearly all situations.
We would have to add to the entrypoint, checking the requirments.txt file, now grep for just the editables, and pip-sync that.
1. I'm hoping this takes less than a second, and will be negligible, and might just leave it at that. This means this will be run every time you start a container.
2. If it takes longer, say 10-30 seconds, would have to add some sort of check to see if this has been done on this venv volume. Probably creating a hidden file as a flag to say "Yep, did that already" so this will be only be done every time the volume is cleared (which happens on just sync)
3. I can't imagine this taking the original length of time as the unedited requirements.txt file, but that would be the worse case scenario

decrispell commented 1 year ago

Are there any advantages of that solution over fake_package? It doesn't sound any cleaner to me.

andyneff commented 1 year ago

The advantages if I can pull it off should be that:

It would be one thing that new_just adds to the Dockerfile and you don't need to update it for every editable package.
It would mean you no longer need to copy the setup.py files into the Docker just for this

andyneff commented 1 year ago

Our tests with pip-tools just showed:

We can indeed use the requirements.txt files to install only partial environment

pip install -c requirements.txt numpy torch pip-tools

And we will probably be able to add the --require-hashes once we start using hashes

This will give us the flexibility to easily install "install dependencies" for non-PEP-517 projects and pip-tools itself so we can track and use pip-tools.

VisionSystemsInc / vsi_common

New Just rewrite #436