containers / image

Work with containers' images
Apache License 2.0
862 stars 376 forks source link

docker: support for requesting chunks without end offset #2391

Closed giuseppe closed 4 months ago

giuseppe commented 5 months ago

if the specified length for the range is set to -1, then request all the data possible by using the "Range: =-" syntax for the HTTP range.

giuseppe commented 5 months ago

the error I am seeing is that we pass the value -1 to

  • How this is happening? I can’t see any c/storage caller setting this to -1 .

the problem is here https://github.com/containers/storage/blob/main/pkg/chunked/storage_linux.go#L1611-L1617

in some cases the blobSize is set to -1 from c/image, e.g. when pulling quay.io/libpod/alpine-with-seccomp:label.

I could change the code to not use any range (if there are no ranges, then request the whole file), but I thought this would be more flexible. We can restrict it so that only the last chunk can have len=-1

mtrmac commented 5 months ago

OK, so this is

Pedantically, when the size is unknown, while GetDiffer requires it (all three implementations currently do, and there is no “unknown” value documented, the locally correct thing to do is for PutBlobPartial to fail without even trying.

If you do want the conversion path to work on these images… yes, I think adding the “length unknown” case to ImageSourceSeekable / BlobChunkAccessor is the simplest approach. (Notably blobChunkAccessorProxy is simpler when there is only one API to call, having to add a whole separate “no-partial download” case to that would not be worth it). So, document the field value, allow it only for the last chunk.


But, at a higher level, this is really another argument to say that the transparent conversion should not be via PutBlobPartial/GetDiffer at all: let it work like an ordinary non-chunked pull, letting c/image deal with the whole blob and digest verification, and then do the conversion all the way down in overlay’s ApplyDiff.

giuseppe commented 5 months ago

OK, so this is

  • a schema1 image (unknown layer size)
  • the transparent conversion path (needs to read the full file)
  • ImageSourceSeekable not having a “read all” method at all.

Pedantically, when the size is unknown, while GetDiffer requires it (all three implementations currently do, and there is no “unknown” value documented, the locally correct thing to do is for PutBlobPartial to fail without even trying.

If you do want the conversion path to work on these images… yes, I think adding the “length unknown” case to ImageSourceSeekable / BlobChunkAccessor is the simplest approach. (Notably blobChunkAccessorProxy is simpler when there is only one API to call, having to add a whole separate “no-partial download” case to that would not be worth it). So, document the field value, allow it only for the last chunk.

But, at a higher level, this is really another argument to say that the transparent conversion should not be via PutBlobPartial/GetDiffer at all: let it work like an ordinary non-chunked pull, letting c/image deal with the whole blob and digest verification, and then do the conversion all the way down in overlay’s ApplyDiff.

we need to be able to pull via the PutBlobPartial when convert_images is enabled. composefs needs it to calculate the digests and enable fs-verity for each file.

mtrmac commented 5 months ago

As far as c/image is concerned, with UncompressedDigest known, the layer’s singleLayerIDComponent is going to be the DiffID, and the ComposeFS usage is ~invisible to c/image. I naively imagine that whether the call stack says GetDiffer or ApplyDiff does not really matter for fs-verity and other work.


One change that does make a difference, however, is that GetDiffer is parallelized over up to 6 layers concurrently, but layer commit is sequential. That might well be a very strong reason to do the conversion in GetDiffer, not in the commit phase, I didn’t try to measure that.

If this does help, arguably the layer extraction should be similarly made concurrent for non-chunked/non-ComposeFS layers as well. But that redesign would really be a very different conversation (and we might end up deprecating non-ComposeFS layers before that).

giuseppe commented 5 months ago

I've tried adding a new method GetBlob to retrieve the entire blob, but I think the current API is more flexible, with a minimal change.

I've updated the comments

giuseppe commented 5 months ago

thanks, addressed your comments in the last version

rhatdan commented 4 months ago

Seems like there is one comment by @mtrmac that was not addressed?

giuseppe commented 4 months ago

Seems like there is one comment by @mtrmac that was not addressed?

I thought I've addressed them all, which one is still missing?

rhatdan commented 4 months ago

https://github.com/containers/image/pull/2391/files#r1589601556

giuseppe commented 4 months ago

https://github.com/containers/image/pull/2391/files#r1589601556

Thanks. Fixed now