I was not expecting to go this deep into the caching logic, but it became more and more clear to me while working on #3962 that our internal mark_invalid for misaligned datas actually requires some more thought, and we were only getting away with it before by luck.
The issue goes something like this. Say a test function draws an integer then a boolean. That means a data with an ir prefix of [0, True] is valid and a data of [0, 2] is misaligned. The latter hits an internal mark_invalid and has Status.INVALID. Now, how should [0, 2] be cached? Its resulting ir tree is just [0], since it was invalid "between" 0 and 2 and we don't know what value the test function would have drawn in this scenario. But mapping [0]: invalid in our cache would be incorrect! In this case [0] is actually an overrun because we draw twice. Moreoever, some extensions of [0] like [0, True] are valid while others are not.
A separate problem is how do we get nice caching behavior out of DataTree.simulate_test_function? We would like extensions of [0, 2] to be simulated to [0, 2] as invalid. But DataTree has a fundamental invariant that splitting at differing kwargs/ir types is flaky, and you can only split at differing values. So we cannot store [0, 2] in the standard way, as it would be immediately rejected as flaky alongside [0, True].
This pull special-cases the "invalid because misaligned" caching logic to include the node in the prefix we were invalid at, such that we cache [0, 2]: invalid. It also adds similar special logic for DataTree such that simulate_test_function will conclude_test(Status.INVALID) in all the right places. This already happened in practice for misaligned datas when drawing, except in the case of a null transition, where we need to probe one draw further to see if we would have been invalid.
This pull does not take the extra step of also caching/simulating [0, 1] to invalid, given [0, 2] is invalid. In practice this is possible: if [0, 2] is invalid because 2 has the wrong ir type, then any value at i=1 which differs in ir type can also be simulated to invalid without calling through to the test function. I think we could revisit this later if need be.
I'm not enormously happy with this solution, fwiw. Would love to go with a more principled solution if one exists.
I was not expecting to go this deep into the caching logic, but it became more and more clear to me while working on #3962 that our internal
mark_invalid
for misaligned datas actually requires some more thought, and we were only getting away with it before by luck.The issue goes something like this. Say a test function draws an integer then a boolean. That means a data with an ir prefix of
[0, True]
is valid and a data of[0, 2]
is misaligned. The latter hits an internalmark_invalid
and hasStatus.INVALID
. Now, how should[0, 2]
be cached? Its resulting ir tree is just[0]
, since it was invalid "between" 0 and 2 and we don't know what value the test function would have drawn in this scenario. But mapping[0]: invalid
in our cache would be incorrect! In this case[0]
is actually an overrun because we draw twice. Moreoever, some extensions of[0]
like[0, True]
are valid while others are not.A separate problem is how do we get nice caching behavior out of
DataTree.simulate_test_function
? We would like extensions of[0, 2]
to be simulated to[0, 2]
as invalid. ButDataTree
has a fundamental invariant that splitting at differing kwargs/ir types is flaky, and you can only split at differing values. So we cannot store[0, 2]
in the standard way, as it would be immediately rejected as flaky alongside[0, True]
.This pull special-cases the "invalid because misaligned" caching logic to include the node in the prefix we were invalid at, such that we cache
[0, 2]: invalid
. It also adds similar special logic for DataTree such thatsimulate_test_function
willconclude_test(Status.INVALID)
in all the right places. This already happened in practice for misaligned datas when drawing, except in the case of a null transition, where we need to probe one draw further to see if we would have been invalid.This pull does not take the extra step of also caching/simulating
[0, 1]
to invalid, given[0, 2]
is invalid. In practice this is possible: if[0, 2]
is invalid because 2 has the wrong ir type, then any value ati=1
which differs in ir type can also be simulated to invalid without calling through to the test function. I think we could revisit this later if need be.I'm not enormously happy with this solution, fwiw. Would love to go with a more principled solution if one exists.