Open jldodds opened 8 years ago
Some initial thoughts about this feature:
What kinds can we reasonably implement this feature for? From my perspective, it would be useful even if it only worked for type arguments of kind #
. I'm going to restrict myself in the below to considering only numeric type arguments, since the story is a little more clear there (how can we randomly select types from an open universe of types?).
Given a manually specified range, should we reject the range if some elements do not satisfy the constraints on the function? Should we reject only if the range contains no elements for which the constraints hold? At least a warning for invalid range elements should probably be given.
This would be even more convenient if we didn't have to manually specify a range and the user experience was just like testing a random set of values. I can see two ways this might work:
For a type parameter n : #
in some function f : {n} (...) => ...
we randomly select values of n
from some distribution, and check whether the constraints on n
hold. Naively, this means instantiating the type at the concrete value chosen for n
and typechecking it in the ordinary way, but this might be expensive for many tests. We could do better by instead checking the constraints as if they were value-level preconditions. For instance, if we have f : {n} (n <= 5) => [n] -> [n]
, we would randomly generate values for n
and concretely evaluate the predicate n <= 5
in lieu of invoking the typechecker. This is unsound if our concrete interpretation of constraints ever misaligns with the typechecker, so some care would be needed here, but my guess is that it would dramatically speed things up.
As in ordinary random testing with preconditions, there is a risk that we'll discard the vast majority of our samples if the precondition is valid for only a small part of the sample space. We could do something fancy to try and fix this degradation in sample efficiency by attempting to synthesize the valid range of the constraints via SMT solver. If we find all the satisfying solutions for the set of constraints, letting the quantified type variables be considered free variables, we could sample from that space instead. This has a couple of complexities associated:
@brianhuffman Some of my thoughts are above. What do you think about these approaches?
Further thoughts: Random sampling from solutions to an SMT problem can be done even if the SMT-solver is a black box, as outlined in this StackOverflow response: https://cs.stackexchange.com/a/33821. The gist of it: for a formula F : [n][#] -> Bool
, select a "good" hash function H : [n][#] -> #
where n
is the number of free variables in the formula (where "good" means: should be quick for the SMT solver to invert but relatively well-distributed, etc.). Then, pick a random s : #
in the range of H
. Generate a solution, if one exists, to F(x, y, ...) /\ H(x, y, ...) = s
. This forces the SMT solver to invert H
, and if the hash is a reasonable one, this lets you tweak the seed s
to get widely varying possible assignments for F
.
It would be very useful if the check command could use random type arguments as well as value arguments, even if ranges need to be manually specified for those arguments