cloudflare / serverless-registry

A container registry backed by Workers and R2.
Apache License 2.0
997 stars 36 forks source link

Request Body Limit circumvention #42

Closed captainjapeng closed 2 months ago

captainjapeng commented 2 months ago

To circumvent that limitation, you can manually add the layer and the manifest into the R2 bucket or use a client that is able to chunk uploads in sizes less than 500MB (or the limit that you have in your Workers plan).

As quoted above, we can add the layers and manifest directly to the R2 bucket. Can you help me on how I access the layers and manifest directly from docker after building it? And if there's any specific requirements such as httpMetadata when uploading to R2.

I'm willing to create a script that does this, but need a little direction where to locate and access the layers.

Thanks!

gabivlj commented 2 months ago

Hello!

Blobs are stored in path <namespace>/blobs/<digest> and same thing with manifests but replacing 'blobs' with 'manifests'.

captainjapeng commented 2 months ago

Can you confirm if the docker save command should be the command I use to get these digests and manifests?

These are the contents of the tar output of docker save: image

captainjapeng commented 2 months ago

I can't seem to find how the docker push command generates its digest for the layers as when checking with docker save it produces different hashes.

captainjapeng commented 2 months ago

Hi! I was able to create a script to upload on serverless-registry via regctl but kind of inefficient due to the generation of a tar and extracting it again.

param (
    [string]$registry = "your.registry.endpoint",
    [string]$name = "imagename",
    [string]$version = "latest"
)

# Create the 'dist' directory if it doesn't exist
if (-not (Test-Path "./dist")) {
    New-Item -ItemType Directory -Path "./dist"
}

# Build the Docker image and export it to an OCI format inside the 'dist' directory
docker buildx build -t ${name}:${version} -o type=oci,dest=./dist/${name}.oci .

# Import the OCI image using regctl inside the 'dist' directory
regctl.exe image import ocidir://dist/${name}:${version} ./dist/${name}.oci

# Remove the OCI image file inside the 'dist' directory
Remove-Item ./dist/${name}.oci

# Copy the image to the remote Docker registry
regctl.exe image copy ocidir://dist/${name}:${version} ${registry}/${name}:${version}

# Clean up the OCI directory
Remove-Item -Recurse -Force ./dist/${name}
gabivlj commented 2 months ago

Hello! Sorry for the delay. I was in PTO and I couldn't focus on this thread for a while.

You are correct, docker save can export the image. I suspect the digests are different from docker push because docker push uses gzip compression when pushing each layer, that's why it produces different digests.

Good job btw! This is very nice. I will look into how to include something in this repo that is able to push to layers to the registry when they're above a certain size.

captainjapeng commented 2 months ago

Oh, alright. One thing I also did was update chunk.ts to a higher minimum chunk size because it's throwing a max call stack error. This also made the upload faster.

// 50MiB
export const MINIMUM_CHUNK = 1024 * 1024 * 80;

// 5GiB
export const MAXIMUM_CHUNK = 1024 * 1024 * 1024 * 5;

// 500MB
export const MAXIMUM_CHUNK_UPLOAD_SIZE = 1000 * 1000 * 500;

With these changes, I could successfully upload a ~4.5GB layer.

Strangersknowme commented 2 months ago

Can you explain how does this script help in pushing large images? I want to use the serverless-registry to push my ml app's images which can be in 10s of gigabytes, how does this solve it. I don't know how regctl copy works but is it breaking down each image into smaller chunk?

I can see from the script that you provided, we are still using the same serverless-registry endpoint, so won't the request body limit still apply for uploads? Like for example, for an ML app, you have a layer for the requirements.txt installation which can be serveral gigabytes, how does it upload that?

You are saying that you have been successful in uploading a ~4.5GB layer. I just want to understand how this is working. @captainjapeng

captainjapeng commented 2 months ago

The regctl client supports the chunk upload mechanism of the distribution specification while docker push does not. That's why it works, and chunk.ts specifies how big theae chunks should be.

On Sun, Sep 22, 2024, 7:32 PM Lalit @.***> wrote:

Can you explain how does this script help in pushing large images? I want to use the serverless-registry to push my ml app's images which can be in 10s of gigabytes, how does this solve it. I don't know how regctl copy works but is it breaking down each image into smaller chunk?

I can see from the script that you provided, we are still using the same serverless-registry endpoint, so won't the request body limit still apply for uploads? Like for example, for an ML app, you have a layer for the requirements.txt installation which can be serveral gigabytes, how does it upload that?

You are saying that you have been successful in uploading a ~4.5GB layer. I just want to understand how this is working. @captainjapeng https://github.com/captainjapeng

— Reply to this email directly, view it on GitHub https://github.com/cloudflare/serverless-registry/issues/42#issuecomment-2366738851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHOPUZKIQQAY2ID5UD2XPLZX2TGTAVCNFSM6AAAAABNQAZZX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRWG4ZTQOBVGE . You are receiving this because you were mentioned.Message ID: @.***>

Strangersknowme commented 2 months ago

@captainjapeng You actually don't even need to make a tar ball of your image first and then importing it with regctl, you can just directly copy the oci-dir compatible directory to your remote registry. I use podman but you can check the docs to output your image as an oci directory instead of a tar ball. I am using this script now:

#!/bin/bash

# Variables
registry=${1:-"your.registry.endpoint"}  # The first argument is the registry endpoint, default to "your.registry.endpoint"
name=${2:-"imagename"}                   # The second argument is the image name, default to "imagename"
version=${3:-"latest"}                   # The third argument is the image version, default to "latest"
dockerfile_path=${4:-"."}                # The fourth argument is the path to the Dockerfile directory, default to current directory

# Create the 'oci-images' directory if it doesn't exist
if [ ! -d "./oci-images" ]; then
    mkdir -p ./oci-images
fi

# First build the image normally
podman build -t ${name}:${version} ${dockerfile_path}

#Now, save the build as oci-dir 
podman save --format=oci-dir --output ./oci-images/${name} ${name}:${version}

# Copy the image to the remote Docker registry
./regctl image copy ocidir://oci-images/${name}:${version} ${registry}/${name}:${version}

# Clean up the OCI directory
rm -rf ./oci-images/${name}

Now, it's not slow as well and you can push your image without waiting to extract the tar ball.

gabivlj commented 2 months ago

We could also come back to the distribution repository and ask them to review the proposal: https://github.com/opencontainers/distribution-spec/issues/485

gabivlj commented 2 months ago

Closing as ./push folder links back to this issue's useful advice and we have multiple workarounds now. Thank you everybody!