Closed Nefcanto closed 8 months ago
I do not understand your proposal. You seem to be reporting that your proposed faster alternative implementation doesn't work on account of a permission error, but we would be delighted to improve the performance for our users if that could be addressed. I attempted numerous implementations like the one you proposed which similarly didn't work. That is how I arrived at docker save
+ docker load
. The performance characteristics depend on the available network bandwidth, disk I/O, CPU, Docker image size etc., so it's always wise to measure for your specific use case. Some people want to cache Docker images simply to avoid Docker Hub's rate limits or reduce reliance on third-party networks. If you control where your images are hosted, I suggest hosting them on the GitHub Container Registry (GHCR) and not caching them at all. Please bear in mind that open-source software typically comes without any warranties of fitness for any particular purpose, so your expectations may be met with hostility in other open-source projects, particularly those with more open issues.
@Kurt-von-Laven, it's not your fault that docker save
and docker load
are slow. Main reason for caching is not to bypass rate limits. Universally it's accepted that the main reason for caching is to increase performance.
Now that you have knowledge about how to create GitHub pre-built actions, I suggest that you at least create an option to either select the save/load or restore the /var/lib/docker/image
. The first option is not suitable for performance. It's however suitable for bypassing rate limits or other things.
Main reason for caching is not to bypass rate limits. Universally it's accepted that the main reason for caching is to increase performance.
In general this is certainly true, but this action doesn't cache arbitrary data, only Docker images, where the use cases I mentioned are very real alternative motivations for caching. I would again consider your tone though since any given project may or may not have been written with your specific use case and needs in mind.
I suggest that you at least create an option to either select the save/load or restore the
/var/lib/docker/image
. The first option is not suitable for performance.
Again, I lack comprehension of what you are proposing. You have shared a code sample that you state does not work and asked me to implement something I believe to be fundamentally unworkable based on my past experience trying to do the same thing you suggested. We are open to performance improvements if any can be found.
@Kurt-von-Laven, thank you for your patience and kind words.
Let me explain in more detail.
I'm not an expert in Docker. As much as I know Docker creates and manages its images in layers, and caches those layers in a default directory located at /var/lib/docker/image
.
It's similar to NPM. When Docker wants to pull an image, it checks that directory for cached layers and if they are there, it uses them to form the image locally.
That's the main reason behind Already exists
labels that we see when we execute docker pull
command.
So, the correct path towards improving performance is to follow the NPM strategy for caching, and cache that directory.
Based on this approach, we can have two options at hand:
If we manage to do either of these two steps, we can significantly improve our pull command performance.
Thank you for the more detailed explanation. This action supports both Linux and Windows, but hopefully most of the differences between the two would boil down to the use of different file paths.
I suspect that option 1 will be slower than anticipated on GitHub-hosted runners on account of the multiple GB of cached Docker images that I would expect would need to be copied to a different directory. This might break users with running containers, which I suspect includes anyone who transitively depends on a Docker container action. It may even run some users out of disk space. It is possible that an aggressive implementation could kick off the copy in a pre-step in order to parallelize it, but this would also introduce a significant amount of complexity and possible failure modes. Existing users will be further impacted by such a change if they have large Docker images that they don't cache with this action (e.g., because they build rather than pull them), but I am open to considering it as a major version bump if it is demonstrated to be typically faster in practice and compatible with Docker container actions. What we would ideally want is a way to add an extra Docker data directory, which would be a feature request for Docker.
I have tried option 2 in the past and felt it was a dead end, but I remain open to being proven wrong about this. My understanding is that GitHub Actions is tightly integrated with Docker and doesn't want users messing with it very much. I don't believe Docker was omitted from GitHub's official caching examples by accident. If it weren't, there would never have been a reason to create this action in the first place.
I suspect that even if either of these implementations work now, they will be more brittle to future changes made by GitHub to prevent this level of tampering with the Docker configuration that their own features rely on. rootless-docker restarts the Docker daemon in rootless mode to prevent similar permission errors to the one you encountered. It was broken repeatedly without warning by undocumented changes to GitHub Action's handling of Docker as you can see from the stream of fixes in July 2022. If something similar happened here, we could revert to docker save
+ docker load
in the worst case, so I feel it would be worth the risk if there is a way to improve performance.
I am closing this issue for now, but happy to reopen it if an approach to improving performance is found.
Hi
I wanted to use this
docker-cache
action and this is my code:But as it's clear from this image, your action made my workflow time to increase from 1:30 up to 2:00 minutes:
The problem is all about
docker save
anddocker load
. Both of them are extremely slow. In GitHub Actions it's faster to download images rather thandocker load
them from cache.Can you please work on
/var/lib/docker/image
to skip thedocker save
anddocker load
steps?Because when I use this code:
I get this error:
If you work on this part, on simply putting the
/var/lib/docker/image
directory back in its place, then we can passdocker save
anddocker load
and it drastically reduces the time. But for now, your action actually worsens the workflow time.Thank you