devthefuture-org / dockerfile-x

Dockerfile factorization superset
https://codeberg.org/devthefuture/dockerfile-x/
MIT License
35 stars 4 forks source link

Cache invalidation issue (with ARG instructions) #16

Closed shinsenter closed 4 months ago

shinsenter commented 4 months ago

Hi,

First of all, I would like to thank you for creating devthefuture/dockerfile-x. I am using it, and it works as expected. However, I have encountered an issue related to build cache when using an ARG parameter.

I pass an ARG parameter called BUILD_TIMESTAMP and have placed this parameter at the end of my Dockerfile. In principle, the build cache should be used if the previous steps remain unchanged, with only the last instruction being updated.

After some investigation, it appears that all ARG instructions are set to the top of the compiled Dockerfile when using devthefuture/dockerfile-x, which invalidates the build cache. My builds rebuild entirely instead of utilizing the cache.

Could you please look into this issue? Any guidance or a possible fix would be greatly appreciated.

Best regards,

shinsenter commented 4 months ago

The docker documentation provides details:

Impact on build caching

ARG variables are not persisted into the built image as ENV variables are. However, ARG variables do impact the build cache in similar ways. If a Dockerfile defines an ARG variable whose value is different from a previous build, then a "cache miss" occurs upon its first usage, not its definition. In particular, all RUN instructions following an ARG instruction use the ARG variable implicitly (as an environment variable), thus can cause a cache miss. All predefined ARG variables are exempt from caching unless there is a matching ARG statement in the Dockerfile.

devthejo commented 4 months ago

OK, thanks for reporting, I will look to fix this ASAP

devthejo commented 4 months ago

this should be fixed now (version 1.4.2), for more details see https://github.com/devthefuture-org/dockerfile-x/issues/10#issuecomment-2268943569

let me know if all is good from now :-)

shinsenter commented 4 months ago

@devthejo Let me try testing this new version and give feedback later. 👀

It looks like the Docker images on Docker Hub aren't updated to the latest version yet, right?

devthejo commented 4 months ago

Screenshot 2024-08-05 at 15-31-27 devthefuture_dockerfile-x Tags Docker Hub

it's updated

shinsenter commented 4 months ago

@devthejo

The ARG instructors have been positioned in the compiled Dockerfile as desired, but the issue with the cache remains unresolved.

I have tried rebuilding the Docker images from the same source code more than three times, and each time, Docker rebuilds from scratch without using the cache (to be precise, the cache is used only for downloading the base images from Docker Hub).

Screenshot 2024-08-05 at 22 45 33

However, the issue may only occur when using the devthefuture/dockerfile-x syntax. When I try copying the content of the compiled Dockerfile and rebuilding with the same parameters, the cache is utilized.

Screenshot 2024-08-05 at 22 48 59

I am using Docker Desktop (macOS version), and that's how I am verifying whether the cache is working. If you have a better way to debug this, please let me know.


You can see the difference in the Cache column and the Duration column in the two images above.

shinsenter commented 4 months ago

@devthejo

Anyway, thank you for your prompt response. I will try building my Docker images using my GitHub workflows and monitor the build outputs for the next few days. If the cache is utilized, I will inform you soon and close this ticket.

Cheers.

devthejo commented 4 months ago

OK, thank you for your great feedback. For information I use buildkit myself (as a service on my cluster https://github.com/moby/buildkit/tree/master/examples/kubernetes, and activating the option on my linux laptop, in older docker version I was running: DOCKER_BUILDKIT=1 docker build .) and my cache is working using dockerfile-x. In my experience, in the github actions, the cache is not always here, even using registry cache, this is why I use buildkit service. I'm very concerned about cache (as you can see in this project https://github.com/devthefuture-org/yarn-plugin-fetch). So let me know if I can do something to help. related (if this can help):

shinsenter commented 4 months ago

@devthejo

Thank you for the reference links.

I'm not sure if this is the cause, but I need you to confirm. In different builds, the temporary names for the included dockerfiles change slightly (like, dockerfile_1722866031).

Screenshot 2024-08-06 at 0 29 27

Could these changes affect the checksum of the build context and cause the cache to become ineffective? I'm just curious.

devthejo commented 4 months ago

As far as I know (and in my experience), the comments have no effect on the cache

shinsenter commented 4 months ago

Oh, thank you for the confirmation.

shinsenter commented 4 months ago

@devthejo I just checked the build results of my workflows and saw that the cache was used in some builds due to repetition in the build matrix I am using.

Thanks again for fixing this issue. I will close this ticket as my problem has been resolved.