incorrect ActionResult validation

mostynb commented 5 years ago

I recently added some ActionResult validation, including one check that seems to be technically incorrect but still useful in practice: zero-length serialized ActionResult messages are currently considered invalid.

This is technically incorrect, because an ActionResult might legitimately have no outputs apart from the exit code (and if that exit code is 0 then the serialized value could be zero-length). One possible use case for this is a sanity check. But this is bad, because a zero-length file is perhaps the most common form of data corruption, and we have no way to recognise when this is corruption and when it is legitimate.

However in practice, there are many cases where this type of ActionResult can be reasonably be expected never to occur (eg when REAPI is only used for compilation jobs, like recc or goma jobs). And in that case, maybe it's worth keeping this validation, off by default and hidden by a command line flag. Would such a flag be welcome, or should I just remove this validation?

mostynb commented 5 years ago

@ob ^

ob commented 5 years ago

I don't think this should be a problem since an empty action (no outputs) with exit 0 that is not found in the cache would be run locally. I'd rather have the peace of mind of not having builds break due to a legit action that got truncated to zero bytes by the FS.

mostynb commented 5 years ago

@ob: I'm not sure that would work in all cases though: 1) If the client builds locally and tries to upload to the cache, if bazel-remote returns an error for these empty blobs, then the client will probably notice the error and stop. On the other hand, if we modified bazel-remote to accept empty blobs but not actually cache them, then clients (and remote execution workers) can get stuck in infinite loops trying to fetch such blobs. 2) If you're also using remote execution, the result would always be rejected by bazel-remote and the client will never be able to get the response. 3) If you had many of these jobs, and they each take a long time to run- you definitely wouldn't want to run them all locally.

That said, I think these kinds of legitimate jobs are not very likely, and in my bazel-remote setup they are impossible (I only use bazel remote for compilation jobs, and they always produce non-zero ActionResult messages).

So I think a command line flag would be the best solution, until we can fix this problem in REAPIv3.

ob commented 5 years ago

Good points. I hadn't considered remote execution. In that case putting the functionality behind a flag seems ok. Thanks also for following up with REAPIv3.

mostynb commented 5 years ago

I think I found a solution without a flag (add a worker name to the ActionResult metatada if it's not already set, which makes the serialized form non-zero length): https://github.com/buchgr/bazel-remote/pull/122

buchgr / bazel-remote

incorrect ActionResult validation #121