Closed bduffany closed 2 years ago
I dug into this a bit, and early-return can work if this happens (referring to generated go bindings):
In the server's Write(srv bytestream.ByteStream_WriteServer)
method:
1) Immediately detect that the blob already exists (in this example, but it could also be detected later).
2) Call srv.SendAndClose(&resp)
with a non-nil *WriteResponse
as described (with committed_size
set to some value that the client will check).
3) Return a nil error.
On the client side:
1) Create the ByteStream_WriteClient
.
2) Send the first chunk of a blob that already exists, get nil error.
3) Send the second chunk of a blob that already exists, get io.EOF
(which signifies that the server stopped listening).
4) Call CloseAndRecv()
, receive the *WriteResponse
that the server specified in step (2) and a nil error.
Note that in step (3) of the server, if we return a non-nil error like the gRPC AlreadyExists
code instead of nil, then in step (4) of the client CloseAndRecv()
returns a nil *WriteResponse
and the error set by the server (eg AlreadyExists
). I think this makes more sense, but changing to this behaviour would break existing clients so I don't think we should do that- instead I think we should state the server's step (3) should return a nil error explicitly in the quoted remote_execution.proto block.
Then I think we should require the server to set a placeholder committed_size
value of -1
for duplicated compressed-blobs, and clients would have an unambiguous value to check for and still be able to support early returns.
Context: https://github.com/bazelbuild/bazel/issues/14654
There is an ambiguity with the current way that the ByteStream.Write protocol is specified for compressed blobs. The relevant part of the spec with the ambiguity is here: https://github.com/bazelbuild/remote-apis/blob/636121a32fa7b9114311374e4786597d8e7a69f3/build/bazel/remote/execution/v2/remote_execution.proto#L256-L264
The ambiguity is: what does "full size of the uploaded file" mean? Does it mean "compressed size" or "uncompressed size"? If it's compressed size, then it's not useful info to be returned in the response, since compressed size can vary depending on the compression level used by the other client that uploaded the file.
Note also that Bazel 5.0.0 effectively expects this to match the compressed size that it uploaded. Whether or not this is a bug, it means that servers have to wait for Bazel 5.0.0 to upload all chunks before they can reply with a committed_size that Bazel 5.0.0 will be happy with, effectively preventing the "early-exit" strategy when an object already exists in the cache.
From @mostynb on that thread:
This workaround of discarding all received data will work, but it's unclear how significant of a performance impact this will have in practice, compared to early-exit. Hoping that some folks with REAPI experience can chime in.