bazel-contrib / rules_oci

Bazel rules for building OCI containers
Apache License 2.0
301 stars 157 forks source link

Runtime of oci_image is proportional to the total size of of all tars plus the base image #365

Closed wskinner-dd closed 6 months ago

wskinner-dd commented 1 year ago

The time required to build oci_image seems to be proportional to the total size of the tars plus the size of the base image. This was surprising to me. I expected oci_image to take time proportional to the size of only the modified or added layers. After reading through the source code and running some tests, it seems like the reason for this may have something to do with crane mutate.

Are these runtime characteristics an intended or necessary property of rules_oci or the OCI spec? Or are they due to implementation details of rules_oci and its dependencies like crane?

How to reproduce the issue

I tested this behavior by comparing the build time of four scenarios:

  1. An image with a 21GB base image, one 3.6GB layer and one tiny layer.
  2. An image with a 21GB base and one tiny layer.
  3. An image with a 1GB base, one 3.6GB layer and one tiny layer.
  4. An image with a 1GB base and one tiny layer.

For both scenarios, I built the target, then modified app.py and measured the time required to build again.

$ bazel build //:two_layer_big_image
INFO: Elapsed time: 58.190s, Critical Path: 58.13s

$ bazel build //:one_layer_big_image
INFO: Elapsed time: 21.697s, Critical Path: 21.65s

$ bazel build //:two_layer_small_image
INFO: Elapsed time: 48.928s, Critical Path: 48.88s

$ bazel build //:one_layer_small_image
INFO: Elapsed time: 1.089s, Critical Path: 1.04s 

BUILD

pkg_tar(
    name = "app_tar",
    srcs = ["//:app.py"],
)

pkg_tar(
    name = "big_file",
    srcs = ["//:ubuntu-22.04.3-desktop-amd64.iso"],
)

oci_image(
    name = "two_layer_small_image",
    base = "@python_311",
    tars = [
        ":big_file",
        ":app_tar",
    ],
)

oci_image(
    name = "one_layer_small_image",
    base = "@python_311",
    tars = [
        ":app_tar",
    ],
)

oci_image(
    name = "two_layer_big_image",
    base = "@nct_image",
    tars = [
        ":big_file",
        ":app_tar",
    ],
)

oci_image(
    name = "one_layer_big_image",
    base = "@nct_image",
    tars = [
        ":app_tar",
    ],
)
wskinner-dd commented 1 year ago

After reading through the source code, I think I understand the reason this is slower than I expected. Building an oci_image basically does four things:

I wonder if there may be room to speed up the push and mutate steps. If zot's blob cache can be reused across build invocations, push becomes nearly instantaneous (this won't work for crane registries).

For crane mutate, it should in theory be possible to append each layer in order and cache the results. So if tar n changes, we could just append it to the cached result of a previous crane mutate --append <tar n - 1>.

thesayyn commented 1 year ago

https://github.com/bazel-contrib/rules_oci/pull/324 is gonna solve this. I can't continue my work on that right now, packed with other stuff.

thesayyn commented 10 months ago

Blocked by https://github.com/bazelbuild/bazel/issues/20891

thesayyn commented 9 months ago

Working on it at https://github.com/bazelbuild/bazel/pull/21263

thesayyn commented 6 months ago

this is fixed by combination of #560 #559 #558

brett-patterson-ent commented 6 months ago

@thesayyn Thank you for the great work here! I was trying these changes out from the tip of the 2.x branch and we are seeing huge performance improvements.

I did run into one issue I wanted to bring up - I understand these changes aren't officially released yet so apologies if this is a known issue. When trying to include tar layers that are generated from a repository rule, the generated image.sh script fails to find them:

realpath: bazel-out/k8-fastbuild/bin/external/_main~debian_packages~apt__jammy_amd64__libgomp1_12_u_20220319_u_1ubuntu1/layer.tar: No such file or directory

I was able to fix this with a patch to include the layers as inputs to the action, even when symlinking:

diff --git a/oci/private/image.bzl b/oci/private/image.bzl
index 93a6668..bca45a6 100644
--- a/oci/private/image.bzl
+++ b/oci/private/image.bzl
@@ -175,8 +175,7 @@ def _oci_image_impl(ctx):
     # If tree artifact symlinks are supported just add tars into runfiles.
     if use_symlinks:
         transitive_inputs = transitive_inputs + ctx.files.tars
-    else:
-        inputs = inputs + ctx.files.tars
+    inputs = inputs + ctx.files.tars

     # add layers
     for (i, layer) in enumerate(ctx.files.tars):

Is this an appropriate fix or am I doing something wrong on my end?

thesayyn commented 6 months ago

It's very possible that we broke something, could you please file an issue with a reproducer? i'd like to take a look

brett-patterson-ent commented 6 months ago

It's very possible that we broke something, could you please file an issue with a reproducer? i'd like to take a look

Filed https://github.com/bazel-contrib/rules_oci/issues/566, thanks!

thesayyn commented 6 months ago

Is this an appropriate fix or am I doing something wrong on my end?

Principled fix is here: https://github.com/bazel-contrib/rules_oci/pull/567