bazelbuild / remote-apis

An API for caching and execution of actions on a remote system.
Apache License 2.0
331 stars 117 forks source link

Add max_cas_blob_size_bytes to CacheCapabilities message #278

Open diegohavenstein opened 11 months ago

diegohavenstein commented 11 months ago

We need this field to improve the performance of Bazel when using remote cache, since by doing this we can determine the CAS entry upper limit size, thus skipping uploads that would fail due to size constraints.

The issue on Bazel side is https://github.com/bazelbuild/bazel/issues/18449

google-cla[bot] commented 11 months ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

diegohavenstein commented 11 months ago

Though I approve of this change, I do think we're still leaving some things undefined that are worth specifying.

Notably: what's the maximum individual/combined size of Tree messages UpdateActionResult()/GetActionResult() is willing to tolerate to perform completeness checking on? Relatedly, what's the maximum size of a Directory message that may be contained in a Tree?

Buildbarn has the ability to limit these through its maximumTotalTreeSizeBytes and maximumMessageSizeBytes options.

(Sorry for my lack of deep knowledge in this area) Does it make sense to have fine-granular settings for these? Not sure how they exactly relate to max blob size in the CAS Are you referring to UpdateActionResult()/GetActionResult() in ActionCacheServer.java?

EdSchouten commented 11 months ago

Are you referring to UpdateActionResult()/GetActionResult() in ActionCacheServer.java?

Is that ActionCacheServer.java in bazel-buildfarm? If so... yes?

GetActionResult() is only permitted to return ActionResults if all of the output files it references exist. ActionResults may also refer to output directories (Tree objects), which in its turn point to files. If max_cas_blob_size_bytes is set to 100 TB, that still doesn't mean that it's okay to upload a 100 TB Tree object and provide it to a call to UpdateActionResult(). There would be no way for a remote cache to validate it in a reasonable amount of time/space.

bergsieker commented 11 months ago

Are you referring to UpdateActionResult()/GetActionResult() in ActionCacheServer.java?

Is that ActionCacheServer.java in bazel-buildfarm? If so... yes?

GetActionResult() is only permitted to return ActionResults if all of the output files it references exist. ActionResults may also refer to output directories (Tree objects), which in its turn point to files. If max_cas_blob_size_bytes is set to 100 TB, that still doesn't mean that it's okay to upload a 100 TB Tree object and provide it to a call to UpdateActionResult(). There would be no way for a remote cache to validate it in a reasonable amount of time/space.

I agree that there are more types of limits that could be specified here--total blobs in a tree, total tree depth, max files/subdirectories in a level, etc. However, I think this limit stands alone, provides reasonable value, and is easy to understand. I think committing this on its own is good, and we can have a separate conversation about whether specifying additional limits would be valuable.

sluongng commented 11 months ago

Is it worth introducing a new ServerLimits/ServerConstraints message that contains various different limits to be introduced in the future?

diegohavenstein commented 11 months ago

Is it worth introducing a new ServerLimits/ServerConstraints message that contains various different limits to be introduced in the future?

That sounds like a good idea to keep CacheCapabilities stable, yes. I'll push a patch

diegohavenstein commented 11 months ago

Is it worth introducing a new ServerLimits/ServerConstraints message that contains various different limits to be introduced in the future?

I had a look again, I don't think it's good in this context, because many things in the CacheCapabilities could arguably be in the new proposed message (e.g. max_batch_total_size_bytes or supported_compressors), since they describe limits/constraints. As such I think the max_cas_blob_size_bytes is in the right place as of now.

Do you agree @sluongng ?

sluongng commented 11 months ago

Sound reasonable to me. Thanks for investigating! 🤗

diegohavenstein commented 10 months ago

Are you referring to UpdateActionResult()/GetActionResult() in ActionCacheServer.java?

Is that ActionCacheServer.java in bazel-buildfarm? If so... yes? GetActionResult() is only permitted to return ActionResults if all of the output files it references exist. ActionResults may also refer to output directories (Tree objects), which in its turn point to files. If max_cas_blob_size_bytes is set to 100 TB, that still doesn't mean that it's okay to upload a 100 TB Tree object and provide it to a call to UpdateActionResult(). There would be no way for a remote cache to validate it in a reasonable amount of time/space.

I agree that there are more types of limits that could be specified here--total blobs in a tree, total tree depth, max files/subdirectories in a level, etc. However, I think this limit stands alone, provides reasonable value, and is easy to understand. I think committing this on its own is good, and we can have a separate conversation about whether specifying additional limits would be valuable.

I'm good to ship this as-is. What is the process? I can't merge commits here it seems @bergsieker