TraceMachina / nativelink

Bazel RBE with CAS server implementation in Rust. The free and open source cache and remote execution service, prioritizing stability and speed for the people that need it.
https://docs.nativelink.com
Apache License 2.0
236 stars 46 forks source link

Make bytestream limits configurable #1076

Closed MarcusSorealheis closed 3 days ago

MarcusSorealheis commented 5 days ago

Description

I can imagine there are lots of use cases "where a simple rule can generate lots of stdout." @ltekieli pointed out this issue and shared a patch for how it could be fixed. I copied it and added a test to ensure that the configured limit is enforced.

I may need to figure out how to document this change better. Please advise.

Fixes #1070

Type of change

How Has This Been Tested?

Please also list any relevant details for your test configuration

Checklist


This change is Reviewable

ltekieli commented 5 days ago

Hello, the proposed patch from #1070 is quite a simple workaround. IMHO those encoding/decoding limits should be configurable from json for any grpc service. I stumbled on similar fixes that were applied in another project https://github.com/temporalio/sdk-core/pull/693

MarcusSorealheis commented 4 days ago

@ltekieli the more this is discussed, the more it feels like this configurability may not be the right approach. If you look at this post from 2016 on the optimal size limit for streaming large payloads, the GRPC team suggest that we limit to 64KiB ("oral tradition"). Here's the issue: https://github.com/grpc/grpc.github.io/issues/371

MarcusSorealheis commented 4 days ago

@allada @aaronmondal ok. You all are the experts. I'll bring this down and introduce the parameter in the config in an update.

Thank you for the review.

MarcusSorealheis commented 3 days ago

So, it looks like the local-remote-execution integration test is a strong predictor of failure and flakiness. I appreciate that it exists. I've made a change to ensure that the default value for a field exists in the event that it is not supplied. It's now stable and this PR should be ready for review and merging any minute now, after the test passes.