Open alexeagle opened 9 months ago
Hi @alexeagle I had been thinking about this problem, but never thought I could use aspect for this. So after seeing your suggestion, I went ahead implemented this feature. But the bummer was that I hit the max limit of number of layers which I believe is 127 for docker
3557592c9b6f: Loading layer [==================================================>] 432B/432B
c7d6f3eb36c2: Loading layer [==================================================>] 794B/794B
956257f1d3a3: Loading layer [==================================================>] 3.183kB/3.183kB
667da44de747: Loading layer [==================================================>] 1.343kB/1.343kB
88a0745cd9e2: Loading layer [==================================================>] 11.6kB/11.6kB
f3af998a1f5e: Loading layer [==================================================>] 3.543kB/3.543kB
9abe9f6a4fae: Loading layer [==================================================>] 906B/906B
3b5c52259645: Loading layer [==================================================>] 1.711MB/1.711MB
710de8cf73d3: Loading layer [===> ] 32.77kB/497.4kB
max depth exceeded
Not sure if you had thoughts about how this will play out.
I can share my implementation may be in
https://github.com/aspect-build/bazel-examples/tree/main/oci_python_image
A layer per package would be too much for some docker fs implementations (I forget which ones). We typically keep separate image layers for large packages, and bundle all smaller packages together. Something like this:
HEAVY_PIP_PACKAGES = [("\\(torch\\|triton\\)/"), ("nvidia_"), ("xgboost/"), ("\\(jaxlib\\|jax\\)/"), ("scipy/"), ("numpy/")]
PY_INTERPRETER_REGEX = "\\.runfiles/rules_python~[^~/]\\+~python~python_[^/]\\+_x86_64-unknown-linux-gnu"
PIP_PACKAGES_REGEX = "\\.runfiles/rules_python~[^~/]\\+~pip~"
EXTERNAL_PACKAGES_REGEX = "\\.runfiles/[^/]\\+~"
LAYER_FILTER_CMDS = {
"_app": "grep -v -e '{}' -e '{}' -e '{}' $< >$@".format(PY_INTERPRETER_REGEX, PIP_PACKAGES_REGEX, EXTERNAL_PACKAGES_REGEX),
"_external": "(grep -e '{}' $< | grep -v -e '{}' -e '{}' >$@) || true".format(EXTERNAL_PACKAGES_REGEX, PIP_PACKAGES_REGEX, PY_INTERPRETER_REGEX),
"_interpreter": "grep -e '{}' $< >$@".format(PY_INTERPRETER_REGEX),
"_pip": "(grep -e '{}' $< | grep -v -e '~pip~pip_[^_]\\+_\\({}\\)' >$@) || true".format(PIP_PACKAGES_REGEX, "\\|".join(HEAVY_PIP_PACKAGES)),
} | {
"_pip_heavy_{}".format(i): "(grep '\\.runfiles/rules_python~[^~/]\\+~pip~pip_[0-9]\\+_{}' $< >$@) || true".format(suffix)
for i, suffix in enumerate(HEAVY_PIP_PACKAGES)
}
# Keep heavy, slow-changing, layers first.
ORDERED_LAYERS = ["_interpreter", "_external"] + ["_pip_heavy_{}".format(i) for i in range(len(HEAVY_PIP_PACKAGES))] + ["_pip", "_app"]
def py_image(...):
...
for (suffix, cmd) in LAYER_FILTER_CMDS.items():
native.genrule(
name = manifest + suffix,
srcs = [manifest],
outs = [manifest + suffix + ".spec"],
cmd = cmd,
**kwargs
)
tar(
name = layers + suffix,
srcs = [binary],
mtree = manifest + suffix,
**kwargs
)
oci_image(
name = name,
base = base,
entrypoint = ["/entrypoint.sh", entrypoint],
workdir = "/" + entrypoint + ".runfiles/_main",
tars = [entrypoint_layer] + [
layers + suffix
for suffix in ORDERED_LAYERS
],
**kwargs
)
@ewianda thanks for that investigation. Since we can only have "a few" layers, I don't have a principled suggestion for this. Some special cases like pytorch seems like the only valid compromise, but it's not a very clean API to ask the user to tell us which packages are like that at the py_image
terminal.
Maybe the best we can do for this issue is document the workaround. I hope as the ML folks get to a more steady state they can follow normal software engineering practices and not have a GB package?
Yes, documentation and guidance are our best bets here. The code that I pasted above can be refactored to accept a list of heavy packages from the user to tease out into separate layers. Anything more (like setting a wheel size limit and automatically computing this list) will need support from other components in the ecosystem.
the nix-ecosystem had a similar problem of limiting the layer-count when building images with layer-per-package. IIRC the approach described in this blog post by graham is what ended up being implemented. maybe that's of interested to some here.
Nice, thanks @betaboon that's indeed very interesting. I dunno when we'll get to it... and it's not a rules_py issue, really more of a rules_oci issue that it should do something smart when there are "too many layers".
(Really it's another docker problem, docker build
was so poorly thought out :(
What is the current behavior?
Currently our best-effort to make a layered image for Python is three layers: interpreter, site-packages, app: https://github.com/aspect-build/bazel-examples/blob/main/oci_python_image/py_layer.bzl
However, when a package appears under multiple applications, the site-packages layer for each app will have a duplicate of that package, making the build outputs larger. When pytorch is one such package (it is many gigabytes for the GPU variant) then this problem leads to builds taking minutes just to fetch build inputs from a remote cache.
Describe the feature
We can probably do something with aspects to visit each third-party package and produce a
tar
file containing its site-packages entry as a distinct layer. Then when we build a container image for an application thesetar
files should just get re-used rather than being re-built.