Closed TomerAberbach closed 1 year ago
At least for the commutative
option, something like this is probably sufficient (assuming everything else is the same what you have for the current fc.func(arb)
arbitrary):
const f = (a: T, b: T) => {
const args = [stringify(a), stringify(b)]
if (commutative) {
args.sort()
}
const repr = stringify(args)
const val = outs[hash(`${seed}${repr}`) % outs.length];
recorded[repr] = val;
return hasCloneMethod(val) ? val[cloneMethod]() : val;
};
I'm still trying to wrap my head around how you'd ensure associativity... I'll let you know if I think of anything.
It would clearly be a good idea. Looking forward for an implementation of associativity.
In my opinion, as func
already handle any number of parameters, we should have another one for binary functions. Something like binaryFunc
as you suggested seems to be a good option.
I've been looking into implementing associativity and I am convinced that it's impossible to guarantee associativity unless we know all the possible values involved that could be operated on in advance.
It's a bit tricky to implement this constraint though. If the user creates the arbitrary like so, fc.binaryFunc(fc.integer({ min: 0, max: 10 }), { associative: true })
, then ideally we could figure that all possible values of the arbitrary, but this isn't possible in the general case, and the number of possible values could be effectively infinite.
I think I might have another idea to make this feasible though. Instead of introducing a new fc.binaryFunc
arbitrary, perhaps we could introduce a fc.semigroup
arbitrary (from abstract algebra). A semigroup is basically a predefined set of values with an associative binary operator.
So fc.semigroup(fc.integer())
could return Arbitrary<[Set<number>, (a: number, b: number) => number]>
, where the first value in the returned tuple is a random set of values (derived from the fc.integer()
arbitrary) included in the semigroup, and the second value in the returned tuple is an associative function over that set (meaning this function only accepts members of that set and only returns members of that set; we would throw an error if a non-member of that set was given to the function). We may also consider adding an option to control the size of semigroup.
And regarding commutative
, we could add that as an option to fc.semigroup
(e.g. fc.semigroup(arb, { commutative: boolean })
. We may also consider adding the commutative
option to the existing fc.func
(e.g. fc.func(arb, { commutative: boolean })
; we can simply sort the n parameters passed to the function).
Let me know what you think!
unless we know all the possible values involved that could be operated on in advance
Actually we partially know them. We know all the outputs for the function (if we take the same approach as the current implementation).
Knowing that for the moment the generator of function exposed by fast-check is backed an array of up to 10 elements. I am wondering if we could not profit from that one.
Let's say we are able to build an associative function, going from elements of this set to elements of this set.
If we succeed in that task we might be able to extend it to other entries (outside of the backing array too).
@TomerAberbach
Here is a potential implementation, not perfect but it seems to work well actually. The arbitrary has signature: binaryAssociativeFunction(arb: Arbitrary<T>): (a: T, b: T) => T
.
You can try it at: https://runkit.com/dubzzz/associative
const fc = require('fast-check');
// Various helpers
function next(outputForIndexes, datasetLength) {
for (let cursor = outputForIndexes.length -1 ; cursor >= 0 ; --cursor) {
outputForIndexes[cursor]++;
if (outputForIndexes[cursor] < datasetLength) {
return true;
}
outputForIndexes[cursor] = 0;
}
return false;
}
function isAssociative(datasetLength, f) {
for (let a = 0 ; a !== datasetLength ; ++a) {
for (let b = 0 ; b !== datasetLength ; ++b) {
for (let c = 0 ; c !== datasetLength ; ++c) {
if (f(f(a,b),c) !== f(a,f(b,c))) {
return false;
}
}
}
}
return true;
}
function buildAssociativeFunction(datasetLength, randNat) {
const numFunOutputs = datasetLength * datasetLength;
const outputForIndexes = [...Array(numFunOutputs)].map(() => randNat() % datasetLength);
while (true) {
const f = (a, b) => outputForIndexes[a * datasetLength + b];
if (isAssociative(datasetLength, f)) {
return f;
}
next(outputForIndexes, datasetLength)
}
}
// Arbitrary
function binaryAssociativeFunction(arb) {
return fc.tuple(
// min=3 for tests purposes,
// max=3 because buildAssociativeFunction takes too long
fc.set(arb, {minLength: 3, maxLength: 3, compare: (a, b) => fc.stringify(a) === fc.stringify(b)}),
fc.infiniteStream(fc.nat()).noBias()
)
.map(([dataset, randomStream]) => {
const associativeF = buildAssociativeFunction(
dataset.length,
() => randomStream.next().value
);
const stringifiedDataset = dataset.map(d => fc.stringify(d));
return (a, b) => {
// compute index for a
const stringifiedA = fc.stringify(a);
let indexA = stringifiedDataset.indexOf(stringifiedA);
if (indexA === -1) indexA = fc.hash(stringifiedA) % dataset.length;
// compute index for b
const stringifiedB = fc.stringify(b);
let indexB = stringifiedDataset.indexOf(stringifiedB);
if (indexB === -1) indexB = fc.hash(stringifiedB) % dataset.length;
// compute answer
const indexResult = associativeF(indexA, indexB);
if (!(indexA in dataset)) throw new Error('unexpected: invalid indexA');
if (!(indexB in dataset)) throw new Error('unexpected: invalid indexB');
if (!(indexResult in dataset)) throw new Error('unexpected: invalid indexResult');
return dataset[indexResult];
};
})
}
// Just trying with one of the generated functions
const a = 'a';
const b = 'b';
const c = 'c';
const f1 = fc.sample(binaryAssociativeFunction(fc.string()), 1)[0];
console.log('a,b -> ' + f1(a, b));
console.log('(a,b),c -> ' + f1(f1(a, b), c));
console.log('b,c -> ' + f1(b, c));
console.log('a,(b,c) -> ' + f1(a, f1(b, c)));
// Property to confirm it works
fc.assert(fc.property(
binaryAssociativeFunction(fc.string()), fc.string(), fc.string(), fc.string(),
(f, a, b, c) => f(f(a,b),c) === f(a,f(b,c))))
In the current implementation buildAssociativeFunction
takes too long. The higher the number of elements the (lot) longer it takes.
I did a little research and it seems that there may not be a way to generate associative functions efficiently.
Perhaps we could come up with certain functions that we know are associative on integers and use fc.constantFrom
to choose between them. For example, addition and multiplication modulo n are both associative:
https://runkit.com/tomeraberbach/associative-copy
Indeed, switching to well-known associative functions might be a solution.
However, it may raise some issues as the selected associative functions may have some very specific behaviours: for instance both (a, b) => (a + b) % length
and (a, b) => (a * b) % length
are commutative and both have a neutral element (0 for + and 1 for *).
I reworked a bit my initial snippet. The new version is https://runkit.com/dubzzz/associative-bis and seems to be pretty efficient in terms of performances. It does not need to use a maxLength
for the set
. Meanwhile, my current version being based on backtracking pattern I'm still trying to find something cleaner, faster...
I get "Evaluation timed out." for the fc.assert
statement. Does that happen to you?
No, it passed fine. But I may have been lucky 😅
Let's probably close it for now. I'm not sure we will make that much improvements for that one in the next releases. Might be worth re-open it if one day we find an efficient way to implement such capability.
🚀 Feature Request / Motivation
I've been using
fast-check
'sfc.func(arb)
andfc.compareFunc()
arbitraries and they've been very useful because I frequently write functions that expect other functions as arguments!However, sometimes I am testing a function that cannot accept any arbitrary function. Sometimes the function requires its input function to have certain mathematical properties. The ones I've run into most frequently are the following:
f(a, b) = f(b, a)
f(f(a, b), c) = f(a, f(b, c))
Perhaps some new
fc.binaryFunc(arb, { commutative: boolean, associative: boolean })
arbitrary could solve this problem.Example
Using
ava-fast-check
:In this case the function must be both commutative and associative, but there are situations where they are not both required.
I have some ideas for how to implement this functionality so I'd be happy to make a PR if you think this would be useful.