Limit the maximum number of values reuse in `consume`

Stranger6667 commented 1 month ago

Context & Description

I have a use case in Schemathesis where I'd like to reuse the same bundle value up to N times in a single state machine execution. The context is that stateful tests in Schemathesis make "linked" API calls and I'd like to reduce the number of API calls that seemingly do not provide much usefulness but consume the test budget.

At the moment there are a few options:

No consume. This is the status quo and from what I observe, state machines tend to overuse the same value. For example, Schemathesis makes a POST call and creates a new resource, then the same resource is used in way too many same calls, e.g. in 10 consecutive DELETE calls. For this use case, 2 DELETE calls will be useful (to detect use after free), but 10 is likely not.
Use consume. In this case, it becomes problematic to detect use after free, as there is no way to use the same value from a bundle and make e.g. 2 DELETE calls or DELETE -> GET
Configure a lower number of steps. It likely won't help as it will still be possible to make transitions that are not useful.

Note that this tends to happen more in the early state machine executions, then, the sequences become more diverse. However, as each transition involves an API call, the cost of each transition is quite high (I observed responses with an average latency of 0.8-1s in one of the APIs I work with), therefore I'd like to avoid making unnecessary calls.

Proposed solution

Support "counting" consume, which will track the number of usages per value in a bundle and remove it only after the limit is reached.

The API could be like this:

consume(bundle, max_usages_per_value=3)

Let me know what you think about the approach and if any easier ways could achieve the main goal of reducing the number of undesired transitions (looks like implementing preconditions manually could be somewhat more complicated).

Zac-HD commented 1 month ago

Hmm. I'm definitely sympathetic to the desire to get a better distribution here, but leaning on tricks with consume() feels like we're solving the wrong problem. e.g. bundle | consumes(bundle) | bundle will consume about one in three times you draw from it...

My goal is always for Hypothesis to let users express what should be possible in a fairly direct and natural way, without needing to express an opinion on anything else, or concern themselves with the details of how. Of course we always fall short of that in some places and some ways, but it's good to have an aspiration in mind and it seems like the max_usages_per_value= argument isn't really about limiting uses per value, but rather trying to get a different distribution which can detect use-after-free but spends most of the time on other more useful things. There may not be a satisfying solution here, but I'll think about it.

Stranger6667 commented 1 month ago

Aha, I haven’t thought about combining bundles in such a way. I think it will significantly improve the situation!

Thank you for the detailed answer, Zac!

HypothesisWorks / hypothesis

Limit the maximum number of values reuse in `consume` #4085

Context & Description

Proposed solution