We've been seeing a production bug where parallel gather_evidence tool calls are clobbering our environment state. We can try to prompt engineer our way out of this, but deterministically forcing ordered tool calls feels safer.
This change forces them into ordered calls along with a test. We can turn it off with an env var if we'd like in the future. I don't think this should go into settings since it's effectively a bug to turn on for now.
We've been seeing a production bug where parallel gather_evidence tool calls are clobbering our environment state. We can try to prompt engineer our way out of this, but deterministically forcing ordered tool calls feels safer.
This change forces them into ordered calls along with a test. We can turn it off with an env var if we'd like in the future. I don't think this should go into settings since it's effectively a bug to turn on for now.