Closed captainjapeng closed 2 months ago
Hello!
Blobs are stored in path <namespace>/blobs/<digest>
and same thing with manifests but replacing 'blobs' with 'manifests'.
Can you confirm if the docker save
command should be the command I use to get these digests and manifests?
These are the contents of the tar output of docker save
:
I can't seem to find how the docker push
command generates its digest for the layers as when checking with docker save
it produces different hashes.
Hi! I was able to create a script to upload on serverless-registry
via regctl
but kind of inefficient due to the generation of a tar and extracting it again.
param (
[string]$registry = "your.registry.endpoint",
[string]$name = "imagename",
[string]$version = "latest"
)
# Create the 'dist' directory if it doesn't exist
if (-not (Test-Path "./dist")) {
New-Item -ItemType Directory -Path "./dist"
}
# Build the Docker image and export it to an OCI format inside the 'dist' directory
docker buildx build -t ${name}:${version} -o type=oci,dest=./dist/${name}.oci .
# Import the OCI image using regctl inside the 'dist' directory
regctl.exe image import ocidir://dist/${name}:${version} ./dist/${name}.oci
# Remove the OCI image file inside the 'dist' directory
Remove-Item ./dist/${name}.oci
# Copy the image to the remote Docker registry
regctl.exe image copy ocidir://dist/${name}:${version} ${registry}/${name}:${version}
# Clean up the OCI directory
Remove-Item -Recurse -Force ./dist/${name}
Hello! Sorry for the delay. I was in PTO and I couldn't focus on this thread for a while.
You are correct, docker save
can export the image. I suspect the digests are different from docker push
because docker push uses gzip
compression when pushing each layer, that's why it produces different digests.
Good job btw! This is very nice. I will look into how to include something in this repo that is able to push to layers to the registry when they're above a certain size.
Oh, alright. One thing I also did was update chunk.ts
to a higher minimum chunk size because it's throwing a max call stack error. This also made the upload faster.
// 50MiB
export const MINIMUM_CHUNK = 1024 * 1024 * 80;
// 5GiB
export const MAXIMUM_CHUNK = 1024 * 1024 * 1024 * 5;
// 500MB
export const MAXIMUM_CHUNK_UPLOAD_SIZE = 1000 * 1000 * 500;
With these changes, I could successfully upload a ~4.5GB layer.
Can you explain how does this script help in pushing large images? I want to use the serverless-registry
to push my ml app's images which can be in 10s of gigabytes, how does this solve it. I don't know how regctl
copy works but is it breaking down each image into smaller chunk?
I can see from the script that you provided, we are still using the same serverless-registry
endpoint, so won't the request body limit still apply for uploads? Like for example, for an ML app, you have a layer for the requirements.txt installation which can be serveral gigabytes, how does it upload that?
You are saying that you have been successful in uploading a ~4.5GB layer. I just want to understand how this is working. @captainjapeng
The regctl
client supports the chunk upload mechanism of the
distribution
specification while docker push
does not. That's why it
works, and chunk.ts
specifies how big theae chunks should be.
On Sun, Sep 22, 2024, 7:32 PM Lalit @.***> wrote:
Can you explain how does this script help in pushing large images? I want to use the serverless-registry to push my ml app's images which can be in 10s of gigabytes, how does this solve it. I don't know how regctl copy works but is it breaking down each image into smaller chunk?
I can see from the script that you provided, we are still using the same serverless-registry endpoint, so won't the request body limit still apply for uploads? Like for example, for an ML app, you have a layer for the requirements.txt installation which can be serveral gigabytes, how does it upload that?
You are saying that you have been successful in uploading a ~4.5GB layer. I just want to understand how this is working. @captainjapeng https://github.com/captainjapeng
— Reply to this email directly, view it on GitHub https://github.com/cloudflare/serverless-registry/issues/42#issuecomment-2366738851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHOPUZKIQQAY2ID5UD2XPLZX2TGTAVCNFSM6AAAAABNQAZZX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRWG4ZTQOBVGE . You are receiving this because you were mentioned.Message ID: @.***>
@captainjapeng You actually don't even need to make a tar ball of your image first and then importing it with regctl, you can just directly copy the oci-dir
compatible directory to your remote registry. I use podman
but you can check the docs to output your image as an oci directory
instead of a tar ball. I am using this script now:
#!/bin/bash
# Variables
registry=${1:-"your.registry.endpoint"} # The first argument is the registry endpoint, default to "your.registry.endpoint"
name=${2:-"imagename"} # The second argument is the image name, default to "imagename"
version=${3:-"latest"} # The third argument is the image version, default to "latest"
dockerfile_path=${4:-"."} # The fourth argument is the path to the Dockerfile directory, default to current directory
# Create the 'oci-images' directory if it doesn't exist
if [ ! -d "./oci-images" ]; then
mkdir -p ./oci-images
fi
# First build the image normally
podman build -t ${name}:${version} ${dockerfile_path}
#Now, save the build as oci-dir
podman save --format=oci-dir --output ./oci-images/${name} ${name}:${version}
# Copy the image to the remote Docker registry
./regctl image copy ocidir://oci-images/${name}:${version} ${registry}/${name}:${version}
# Clean up the OCI directory
rm -rf ./oci-images/${name}
Now, it's not slow as well and you can push your image without waiting to extract the tar ball.
We could also come back to the distribution repository and ask them to review the proposal: https://github.com/opencontainers/distribution-spec/issues/485
Closing as ./push folder links back to this issue's useful advice and we have multiple workarounds now. Thank you everybody!
As quoted above, we can add the layers and manifest directly to the R2 bucket. Can you help me on how I access the layers and manifest directly from docker after building it? And if there's any specific requirements such as httpMetadata when uploading to R2.
I'm willing to create a script that does this, but need a little direction where to locate and access the layers.
Thanks!