buildpacks / pack

CLI for building apps using Cloud Native Buildpacks
https://buildpacks.io
Apache License 2.0
2.59k stars 289 forks source link

pack build takes 27x longer when using containerd image store #2272

Open edmorley opened 1 month ago

edmorley commented 1 month ago

Summary

When using the containerd image store (which is now the default for new installs for Docker for Desktop), along with an ephemeral builder image (ie: one where additional buildpacks have been added beyond the ones in the builder, or the group order has been overridden), there is a 20 second delay/hang before the "Analyzing" phase starts, plus image exporting takes 9 seconds instead of milliseconds. This occurs even if the build is forced to trusted mode using --trust-extra-buildpacks.


Reproduction

Steps
  1. Go to Docker for Desktop (macOS) settings and ensure Use containerd for pulling and storing images is enabled.
  2. mkdir testcase && cd $_
  3. touch Procfile
  4. time pack build --builder heroku/builder:24 --pull-policy if-not-present --timestamps --verbose --buildpack heroku/procfile --trust-extra-buildpacks testcase
  5. Repeat (4) again to get the fully warm step duration (eg with all images pulled)
  6. Now disable containerd and repeat all of the above
Current behavior

With containerd enabled:

real    0m31.842s
user    0m16.259s
sys 0m2.547s

With containerd disabled:

real    0m1.155s
user    0m0.098s
sys 0m0.053s

The majority of the delay occurs at these two places:

~20 second delay here:

2024/10/21 11:53:30.291079 Setting custom order
2024/10/21 11:53:30.291125 Creating builder with the following buildpacks:
2024/10/21 11:53:30.291140 -> heroku/dotnet@0.1.4
...
2024/10/21 11:53:30.291416 -> heroku/sbt@6.0.3
<20 second delay here>
2024/10/21 11:53:50.449847 Using build cache volume pack-cache-library_testcase_latest-f7f9f5f88c8a.build
2024/10/21 11:53:50.469796 Created ephemeral bridge network pack.local-network-797a6e6f6f617a696a6e with ID 37960f04dc6dfaee7d8514c3e807889122720a8ab6433bdda256cf046b0ad17b

~9 second delay when exporting the app image:

2024/10/21 11:53:50.726872 Timer: Saving testcase... started at 2024-10-21T10:53:50Z
Saving testcase...
2024/10/21 11:53:59.728875 *** Images (22e29d1c8e45):
      testcase

*** Image ID: 22e29d1c8e452d8af08202c31fbbec09eab0d54c6d16136abde3c5cedca0a1aa
2024/10/21 11:53:59.728919
*** Manifest Size: 1087
Timer: Saving testcase... ran for 9.002005629s and ended at 2024-10-21T10:53:59Z
Timer: Exporter ran for 9.033616962s and ended at 2024-10-21T10:53:59Z

When containerd is disabled, the timings for the problematic sections shown above are instead:

~40ms delay (rather than 20 seconds):

2024/10/21 12:01:17.046436 Setting custom order
2024/10/21 12:01:17.046462 Creating builder with the following buildpacks:
2024/10/21 12:01:17.046471 -> heroku/dotnet@0.1.4
...
2024/10/21 12:01:17.046543 -> heroku/sbt@6.0.3
<only a 40ms second delay here>
2024/10/21 12:01:17.085716 Using build cache volume pack-cache-library_testcase_latest-f7f9f5f88c8a.build
2024/10/21 12:01:17.107381 Created ephemeral bridge network pack.local-network-77736a65746968666967 with ID f0e328cf50f3c9781d7f234ba79ca24792c74f8c3a2cc0b2add14439a0184f14

~30ms delay when exporting the app image (rather than 9 seconds):

2024/10/21 12:01:17.287787 Timer: Saving testcase... started at 2024-10-21T11:01:17Z
Saving testcase...
2024/10/21 12:01:17.315210 *** Images (ae79380ddd2d):
      testcase

*** Image ID: ae79380ddd2d20421848292756f1a0c2e42f7c079572b0a54c6ea45ba41dedb8
2024/10/21 12:01:17.315244
*** Manifest Size: 1087
Timer: Saving testcase... ran for 27.457208ms and ended at 2024-10-21T11:01:17Z
Timer: Exporter ran for 60.499042ms and ended at 2024-10-21T11:01:17Z

Full logs:

Expected behavior

Environment

pack info
$ pack report
Pack:
  Version:  0.35.1+git-3a22a7f.build-6099
  OS/Arch:  darwin/arm64

Default Lifecycle Version:  0.20.0

Supported Platform APIs:  0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.10, 0.11, 0.12, 0.13

Config:
  default-builder-image = "[REDACTED]"
docker info
$ docker info
Client:
 Version:    27.2.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /Users/emorley/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /Users/emorley/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /Users/emorley/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /Users/emorley/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/emorley/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /Users/emorley/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /Users/emorley/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.3.0
    Path:     /Users/emorley/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/emorley/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.13.0
    Path:     /Users/emorley/.docker/cli-plugins/docker-scout

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 8
 Server Version: 27.2.0
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fc6bcff51318944179630522a095cc9dbf9f353
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.10.4-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 6
 Total Memory: 15.6GiB
 Name: docker-desktop
 ID: <REDACTED>
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/emorley/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile
edmorley commented 1 month ago

I presume this is related to:

edmorley commented 1 month ago

This issue causes my CNB's integration test suite to take 9 minutes locally instead of 39 seconds.

It seems this issue has been known for some time (but unfortunately not documented in an easy to discover way) and that a solution here is currently blocked on upstream Docker API additions.

I think we should:

  1. Have Pack CLI emit a performance warning if containerd is enabled, until this is resolved (I spent a non-zero amount of time tracking this down and writing an STR, it would be good to avoid others having to figure it out themselves)
  2. Document the containerd issues in the docs (eg on https://buildpacks.io/docs/for-platform-operators/how-to/integrate-ci/pack/)
  3. Improve the overall log output (if only when verbose is enabled), so debugging slow ephemeral builder creation related issues is easier in the future
  4. Drive the API addition with upstream. Someone from Docker did reach out in https://cloud-native.slack.com/archives/C0331B5QS02/p1728595943578249?thread_ts=1728591887.872869&cid=C0331B5QS02 - we should ping them again given containerd is (a) now the default image store for new Docker Desktop installs, (b) is essential for multi-arch workflows so it's a pain to not have it enabled.
natalieparellano commented 1 month ago

Thank you @edmorley for the analysis and suggested actions - I agree with what you've proposed here. Given that points 1-3 are low hanging fruit, I've marked this as a good first issue. If anyone is interested in taking it up, I'd be happy to guide.

wuhenggongzi commented 2 weeks ago

Kindly to ask if there is somebody working for this problem? If not, I want to work on this. @natalieparellano Could you suggest something I can improve to make it work?

jjbustamante commented 2 weeks ago

Hi @wuhenggongzi! I think nobody is working on this, thanks a lot for your help.

Let me try to help with task #1

  1. Have Pack CLI emit a performance warning if containerd is enabled, until this is resolved (I spent a non-zero amount of time tracking this down and writing an STR, it would be good to avoid others having to figure it out themselves)

I think we can do two things:

  1. We can put the warning in imgutil during the save operation of an image

    • To do this, we may need to add a logger option in the local.New constructor and pack will passthrough the logger to imgutil, notice you will need to update the constructor for the Store too.
  2. We can put the warning in pack when creating the docker client

    • To do this, we can:
      • Expose the imgutil function usesContainerdStorage that I mentioned above and use it here to validate if containerd is used and write the warning
      • Copy & Paste the function and do the same thing.

I think I prefer option 1, because at the end the issue must be solved in imgutil and once is fixed we can remove the warning. but I would like to hear some thoughts from @edmorley and @natalieparellano

natalieparellano commented 2 weeks ago

@jjbustamante I have been mulling this over, I think option 1 might be a bit clunky because then we'd need to remember to call something like image.Warnings() after calling image.Save() or else change the signature of that method which would be messy.

It might be easier to print the warning at the point where we construct the image, but that still requires us to call a separate method from NewImage and at that point we might just go ahead and call usesContainerdStorage.

I think we might expose something like IncursDaemonPerformancePenalty in imgutil that we can always call, but when we fix the issue in imgutil we can just update that method to return false.

WDYT?

jjbustamante commented 2 weeks ago

Thanks for the feedback @natalieparellano, what I was thinking about option 1 was the following:

natalieparellano commented 2 weeks ago

I see. The downside there is that imgutil would have full control over whether that message gets printed as debug / info / warn, etc - pack & the lifecycle have no say. That said I think we're pretty much in agreement that a warn message is the way to go. I'm still in favor of exposing something like IncursDaemonPerformancePenalty, but I won't stand in the way of whatever is easiest to implement.

wuhenggongzi commented 2 weeks ago

@natalieparellano , @jjbustamante Thanks for your help, please forgive my poor English. I tried to understand your instructions, this is my understanding:

The above are the changes made before fixing this problem.

jjbustamante commented 2 weeks ago

Hi @wuhenggongzi,

You can start focusing on changes in imgutil repository, the first 4 bullets I mentioned above have the places where I think you can do the changes, BUT, I think we can also explore Natalie's idea IncursDaemonPerformancePenalty but we can think on that later.

I will suggest, you create a draft PR on imgutil and we will guide you from there. I will try to create an issue on imgutil too

jjbustamante commented 2 weeks ago

@wuhenggongzi I created the following issue on imgutil

wuhenggongzi commented 1 week ago

@jjbustamante Thank you for your guidance and help. I have submitted a PR now, but it indicates that there is a problem with my coverage, but I don't know how I should deal with it. Personally, I think it points to code that should already be tested, what should I do to pass it?

natalieparellano commented 1 week ago

Partially resolved by https://github.com/buildpacks/pack/pull/2284, but there is more to do, so I will leave this open and move it to the next milestone