Ideally we want 100% of queries to run without error. Realistically, we can probably allow a small percentage of them to fail without invalidating the benchmarks.
We need to report the error % in the results.
We should have a way to specify the allowed threshold beyond which the benchmarks are considered invalid and are aborted. Ideally, we should be able to show at least the last error in more details (noticing and debugging intermittent errors on benchmarks can be a pain right now).
The default should still be tuned to be high, like 99% must be successful. Not so high as to be indistinguishable from "not even a single error is allowed" especially for slower benchmarks, but high enough so that systemic intermittent errors are likely to trigger it.
Ideally we want 100% of queries to run without error. Realistically, we can probably allow a small percentage of them to fail without invalidating the benchmarks.
We need to report the error % in the results. We should have a way to specify the allowed threshold beyond which the benchmarks are considered invalid and are aborted. Ideally, we should be able to show at least the last error in more details (noticing and debugging intermittent errors on benchmarks can be a pain right now).
The default should still be tuned to be high, like 99% must be successful. Not so high as to be indistinguishable from "not even a single error is allowed" especially for slower benchmarks, but high enough so that systemic intermittent errors are likely to trigger it.