containerd / continuity

A transport-agnostic, filesystem metadata manifest system
https://containerd.io
Apache License 2.0
142 stars 67 forks source link

Proposal: chunk digest #85

Closed AkihiroSuda closed 7 years ago

AkihiroSuda commented 7 years ago

This proposal enables specifying digest of a partial chunk of a file.

I'd like to integrate this into https://github.com/AkihiroSuda/filegrain

diff --git a/proto/manifest.proto b/proto/manifest.proto
index e2e110f..9363478 100644
--- a/proto/manifest.proto
+++ b/proto/manifest.proto
@@ -39,13 +39,9 @@ message Resource {
     // for regular files.
     uint64 size = 7;

-    // Digest specifies the content digest of the target file. Only valid for
-    // regular files. The strings are formatted in OCI style, i.e. <alg>:<encoded>.
-    // For detailed information about the format, please refer to OCI Image Spec:
-    // https://github.com/opencontainers/image-spec/blob/master/descriptor.md#digests-and-verification
-    // The digests are sorted in lexical order and implementations may choose
-    // which algorithms they prefer.
-    repeated string digest = 8;
+    // Digests specifies the content digest of the target file. Only valid for
+    // regular files.
+    repeated DigestEntry digests = 8;

     // Target defines the target of a hard or soft link. Absolute links start
     // with a slash and specify the resource relative to the bundle root.
@@ -85,7 +81,7 @@ message ADSEntry {
     // See also the description about the digest below.
     bytes data = 2;

-    // Digest is a CAS representation of the stream data.
+    // Digests is a CAS representation of the stream data.
     //
     // At least one of data or digest MUST be specified, and either one of them
     // SHOULD be specified.
@@ -93,5 +89,26 @@ message ADSEntry {
     // How to access the actual data using the digest is implementation-specific,
     // and implementations can choose not to implement digest.
     // So, digest SHOULD be used only when the stream data is large.
-    string digest = 3;
+    repeated DigestEntry digests = 3;
+}
+
+// DigestEntry encodes information of digest of a data stream.
+//
+// It is valid to compose multiple digest entries to represent the digest of a single stream.
+// e.g. [{digest=..,begin=0,end=1073741823},{digest=..,begin=1073741824,end=2147483647}].
+//
+// Each of entries SHOULD not overlap and SHOULD cover whole the stream.
+message DigestEntry {
+    // Digest strings are formatted in OCI style, i.e. <alg>:<encoded>.
+    // For detailed information about the format, please refer to OCI Image Spec:
+    // https://github.com/opencontainers/image-spec/blob/master/descriptor.md#digests-and-verification
+    // The digests are sorted in lexical order and implementations may choose
+    // which algorithms they prefer.
+    repeated string digest = 1;
+
+    // Begin and end specify the byte offsets of the chunk of the target data
+    // stream that corresponds to the digest.
+    uint64 begin = 2;
+
+    uint64 end = 3;
 }
AkihiroSuda commented 7 years ago

cc @tonistiigi

This proposal might be related for sending build context with large files

AkihiroSuda commented 7 years ago

@stevvooe @dmcgowan WDYT?

stevvooe commented 7 years ago

@AkihiroSuda I don't think we should integrate a partial solution for this into continuity. I already have some content-based chunking design that handles the issues around digest-based storage, but this is ultimately part of the storage system, rather than built into the format. The reason behind such a design is that different applications may have different storage requirements. Baking this deep into a distribution format hinders both the distribution systems flexibility (has to implement chunk-based model), as well as storage systems model (must store chunks as distribution system sees fit).

Put shortly, the system would work like this:

chunkmap := GetChunkMap(resource.Digest)
for _, chunk := range chunkmap {
  data := GetChunk(chunk.Digest)
}

Even more interesting, are models where you can instantiate an io.ReaderAt with a chunkmap:

readerAt := NewChunkMapReaderAt(chunkmap)

But the biggest benefit to this approach is that it allows both systems to benefit mutually.

From my research, the best chunk model looked something like this:

message Chunk {
    // Digest identifies the chunk by hash. Generally, this field is always set.
    string digest = 1;

    // Offset indicates the offset into a blob. If not part of a blob, this value may 
    int64 offset = 2;

    // Length specifies the length of the chunk. This should always be set.
    int64 length = 3;

    // Data contains the actual bytes for the chunk. This will be unset when
    // using the Chunk as a metadata object or query object.
    bytes data = 252; // use high number to ensure Data is always last
}

message Blob {
    // Digest identifies the blob by the content hash.
    string Digest = 1;

    // length is the total length, in bytes of the data targeted by the blob
    // descriptor. In blobster, we use "length" and "size" interchangably, but
    // the value is always serialized under "length".
    int64 length = 2;

    // chunks describes the offset and size of each chunk making up the blob.
    // These should be ordered by offset but implementations should validate
    // that before processing. Typically, the "data" field of each chunk will
    // be unset. Small blobs may just include data in cases where a blob is
    // very small.
    repeated Chunk chunks = 4;
}

Again, in practice, this makes a lot more sense as an implementation detail of storage system. The less you specify in the distribution format, the access model gains a lot more flexibility.

AkihiroSuda commented 7 years ago

OK, makes sense