fussybeaver / bollard

Docker daemon API in Rust
Apache License 2.0
911 stars 134 forks source link

Custom Build Outputs `--output` #415

Closed levinwinter closed 3 months ago

levinwinter commented 4 months ago

I would like to directly output the Docker build to local disk without creating an intermediate image. For this, Docker/BuildKit has the option of specifying an --output flag (docs). I tried to hunt around bollard, but it seems to me that this API isn't exposed at the moment. Since the Docker Engine API version bollard is targeting already includes this feature, I was wondering whether this is already possible and I simply didn't find it in the docs or else what's required to make this work. Thanks :)

fussybeaver commented 4 months ago

There should be an integration test that demonstrates how to export an OCI image using the GRPC API: https://github.com/fussybeaver/bollard/blob/master/tests/export_test.rs

Note that, the link you provided uses the moby HTTP API (as opposed to buildkit's formal GRPC API), although this is also supported in Bollard through the build_image method, you cannot export OCI images using that API, as documented here.

levinwinter commented 4 months ago

Thank you for the pointers! I'm not super familiar with moby/buildkit, so I took some time to get a basic understanding.

Instead of exporting OCI images, I'd like to use the local or tar exporters (docs) that simply dump the file system of the last layer. I managed to get this up and running by adding a tar option to the ImageExporterEnum and using the docker container gRPC driver.

However, I would prefer to use the build_image method since it's easier and should, in theory (?), also support this. I tried extending that one, but I think it would also need to use the /grpc endpoint as opposed to /session (I keep getting issues with /moby.filesync.v1.FileSend/diffcopy not being recognized by the server).

Let me know if you'd be okay with adding the local/tar exporter, and if so, where you see best fit. I'd be happy to prepare a PR!

fussybeaver commented 4 months ago

Yes, adding the local and tar exporters to build_image would require a couple of changes in Bollard and a PR is very welcome. I'm not so sure you need to handle the /grpc endpoint - this is actually created by the moby docker server if you toggle the buildkit code path by providing a session as part of the BuildImageOptions.

The reason you get a /moby.filesync.v1.FileSend/diffcopy error is because buildkit initiates a GRPC request to save the exported payload to disk, but the filesend provider isn't registered as a routable part of the /session endpoint.

One option is to add the filesend plumbing to the /session endpoint and parameterise it somehow, presumably by adding the output field to BuildImageOptions, though how that field is parsed and interpreted is probably what needs some thought.

levinwinter commented 4 months ago

Awesome! I managed to get the /session endpoint to work with diffcopy. I already tried this before, but I didn't know I also need to register it with the X-Docker-Expose-Session-Grpc-Method header. Exporting a single file using tar is now working.

I'm having issues with the local exporter however. The read loop in FileSendImpl seems to "hang", probably because the protocol is more complex when sending multiple files? Is there a reference implementation for this somewhere? I tried to hunt around buildkit, but I'm not quite sure what exactly is expected.

The data that I receive in the loop is just empty packets (the number of empty packets being equal to the number of files that the last layer has).

fussybeaver commented 4 months ago

The reference implementation for the diffcopy / and filesend (curiously called filesync in buildkit) is here: https://github.com/moby/buildkit/blob/44ebf9071db49821538cd37c2687dd925c7c3661/session/filesync/filesync.go#L78

Although the whole end-to-end flow is somewhat spread across moby, buildkit and the buildx repositories (and quite difficult to follow)

It's possible that some information is stored in the GRPC header metadata, which is not handled in Bollard's implementation..

Be sure to rebase from master as the session headers should be registered uniformly with the grpc_handle method.

levinwinter commented 4 months ago

Thank you! I'm now using the grpc_handle method!

I think that the correct protocol is described here in the fsutil repository.

To receive meaningful data when selecting the local exporter (which sends multiple files), I needed to change the type of the streamed messages in the diffcopy method of the FileSend service. While before the messages were deseralized to empty BytesMessage (i.e. BytesMessage { data: [] }), I now receive meaningful fsutil.types.Packets that contain the filenames of the export. Do you have any idea why that could be?

// FileSync exposes local files from the client to the server.
service FileSync{
    rpc DiffCopy(stream fsutil.types.Packet) returns (stream fsutil.types.Packet);
    rpc TarStream(stream fsutil.types.Packet) returns (stream fsutil.types.Packet);
}

// FileSend allows sending files from the server back to the client.
service FileSend{
-   rpc DiffCopy(stream BytesMessage) returns (stream BytesMessage);
+   rpc DiffCopy(stream fsutil.types.Packet) returns (stream fsutil.types.Packet);
}

To add to this: When exporting using tar, the messages still need to be deserialized as BytesMessage and only when exporting using local one needs to use fsutil.types.Packet. I guess depending on which is passed as an argument, we could chose the correct implementation. Though I must say I'm not sure why this is the case in the first place.

fussybeaver commented 4 months ago

I've noticed that the buildkit protobuf has generated a separate FileSync implementation that takes a Packet: https://github.com/fussybeaver/bollard/blob/cf88562401ce4db01cb558373e59e3dcb39f61ef/codegen/proto/src/generated/moby.filesync.v1.rs#L867-L1128

So, maybe you just need to implement the trait with an appropriate Provider that implements the fileutil as you pointed out, and hook it up to the session endpoint.

levinwinter commented 4 months ago

I tried to implement FileSync, but that seems to be the wrong gRPC service/endpoint (/moby.filesync.v1.FileSync/DiffCopy vs. /moby.filesync.v1.FileSend/DiffCopy). From what I can understand when looking at the Go implementation, they have some sort of raw gRPC stream (of the FileSend service) and just serialize either the BytesMessage or Packet. To be honest, I'm a bit stuck since I don't know whether something equivalent is possible using tonic.

The only "idea" that comes to my mind is to copy the generated protobuf code and have a manual/alternative implementation at hand. But this feels super hack and I'm sure there must be a better way. If you have no idea I could also ask ob moby/buildkit.

fussybeaver commented 4 months ago

Sounds a little weird, one thing you could try is to enable the jaeger tracing interface, which will let you drill down into the payloads sent from buildkit. https://github.com/moby/buildkit?tab=readme-ov-file#opentelemetry-support

Regardless, do keep this thread up-to-date if you get a breakthrough somehow by hacking around the protobuf files..

levinwinter commented 4 months ago

Sorry for the delay, I was not working on this :)

For the moment, my idea is to copy-paste the bit of auto-generated code that I get when changing the protobuf file to DiffCopy(stream fsutil.types.Packet) into the repo and simply wire that up if the local exporter is selected.

As for the "interface" on how to integrate it into bollard, I was thinking of adding a field outputs to BuildImageOptions that takes a Option<ImageBuildOutput>. Currently, there is already a generic ImageBuildOutput in bollard, but perhaps a nicer interface would be something like this.

enum ImageBuildOutput
where
    T: Into<String> + Eq + Hash + Serialize,
{
    /// Exports a tarball to the specified path.
    Tar(T),
    /// Exports the filesystem to the specified path.
    Local(T),
}

Just sharing my ideas and keeping you updated. Will lyk once the PR is up :)