dubzzz / fast-check

Property based testing framework for JavaScript (like QuickCheck) written in TypeScript
https://fast-check.dev/
MIT License
4.28k stars 178 forks source link

Binary function arbitrary for functions with various mathematical properties #1541

Closed TomerAberbach closed 1 year ago

TomerAberbach commented 3 years ago

🚀 Feature Request / Motivation

I've been using fast-check's fc.func(arb) and fc.compareFunc() arbitraries and they've been very useful because I frequently write functions that expect other functions as arguments!

However, sometimes I am testing a function that cannot accept any arbitrary function. Sometimes the function requires its input function to have certain mathematical properties. The ones I've run into most frequently are the following:

Perhaps some new fc.binaryFunc(arb, { commutative: boolean, associative: boolean }) arbitrary could solve this problem.

Example

Using ava-fast-check:

import { testProp, fc } from 'ava-fast-check'

const foldInSomeOrder = (array, initial, fn) => {
  const copy = [...array]
  for (let i = n - 1; i >= 1; i--) {
    const j = Math.floor(Math.random() * (i + 1))
    ;[copy[i], copy[j]] = [copy[j], copy[i]]
  }

  return array.reduce((a, b) => fn(a, b), initial)
}

testProp(
  `foldInSomeOrder produces the same result as Array.prototype.reduce when the input function is commutative and associative`
  [fc.array(fc.anything()), fc.anything(), fc.func(fc.anything(), { commutative: true, associative: true })],
  (t, array, initial, fn) => {
    t.deepEqual(
      foldInSomeOrder(array, initial, fn),
      array.reduce((a, b) => fn(a, b), initial)
    )
  }
)

In this case the function must be both commutative and associative, but there are situations where they are not both required.

I have some ideas for how to implement this functionality so I'd be happy to make a PR if you think this would be useful.

TomerAberbach commented 3 years ago

At least for the commutative option, something like this is probably sufficient (assuming everything else is the same what you have for the current fc.func(arb) arbitrary):

const f = (a: T, b: T) => {
  const args = [stringify(a), stringify(b)]
  if (commutative) {
    args.sort()
  }
  const repr = stringify(args)
  const val = outs[hash(`${seed}${repr}`) % outs.length];
  recorded[repr] = val;
  return hasCloneMethod(val) ? val[cloneMethod]() : val;
};

I'm still trying to wrap my head around how you'd ensure associativity... I'll let you know if I think of anything.

dubzzz commented 3 years ago

It would clearly be a good idea. Looking forward for an implementation of associativity.

In my opinion, as func already handle any number of parameters, we should have another one for binary functions. Something like binaryFunc as you suggested seems to be a good option.

TomerAberbach commented 3 years ago

I've been looking into implementing associativity and I am convinced that it's impossible to guarantee associativity unless we know all the possible values involved that could be operated on in advance.

It's a bit tricky to implement this constraint though. If the user creates the arbitrary like so, fc.binaryFunc(fc.integer({ min: 0, max: 10 }), { associative: true }), then ideally we could figure that all possible values of the arbitrary, but this isn't possible in the general case, and the number of possible values could be effectively infinite.

I think I might have another idea to make this feasible though. Instead of introducing a new fc.binaryFunc arbitrary, perhaps we could introduce a fc.semigroup arbitrary (from abstract algebra). A semigroup is basically a predefined set of values with an associative binary operator.

So fc.semigroup(fc.integer()) could return Arbitrary<[Set<number>, (a: number, b: number) => number]>, where the first value in the returned tuple is a random set of values (derived from the fc.integer() arbitrary) included in the semigroup, and the second value in the returned tuple is an associative function over that set (meaning this function only accepts members of that set and only returns members of that set; we would throw an error if a non-member of that set was given to the function). We may also consider adding an option to control the size of semigroup.

And regarding commutative, we could add that as an option to fc.semigroup (e.g. fc.semigroup(arb, { commutative: boolean }). We may also consider adding the commutative option to the existing fc.func (e.g. fc.func(arb, { commutative: boolean }); we can simply sort the n parameters passed to the function).

Let me know what you think!

dubzzz commented 3 years ago

unless we know all the possible values involved that could be operated on in advance

Actually we partially know them. We know all the outputs for the function (if we take the same approach as the current implementation).


Knowing that for the moment the generator of function exposed by fast-check is backed an array of up to 10 elements. I am wondering if we could not profit from that one.

Let's say we are able to build an associative function, going from elements of this set to elements of this set.

Here is a dummy snippet that can be used to compute how many associative functions there are for N elements ```js function next(outputForIndexes, datasetLength) { for (let cursor = outputForIndexes.length -1 ; cursor >= 0 ; --cursor) { outputForIndexes[cursor]++; if (outputForIndexes[cursor] < datasetLength) { return true; } outputForIndexes[cursor] = 0; } return false; } function isAssociative(datasetLength, f) { for (let a = 0 ; a !== datasetLength ; ++a) { for (let b = 0 ; b !== datasetLength ; ++b) { for (let c = 0 ; c !== datasetLength ; ++c) { if (f(f(a,b),c) !== f(a,f(b,c))) { return false; } } } } return true; } function printFunction(datasetLength, f) { console.log("Function is:") for (let a = 0 ; a !== datasetLength ; ++a) { for (let b = 0 ; b !== datasetLength ; ++b) { console.log(` | (${a}, ${b}) -> ${f(a,b)}`); } } return true; } function compute(datasetLength) { let count = 0; let totalCount = 0; const numFunOutputs = datasetLength * datasetLength; const outputForIndexes = [...Array(numFunOutputs-1).fill(0), -1]; while (next(outputForIndexes, datasetLength)) { ++totalCount; const f = (a, b) => outputForIndexes[a * datasetLength + b]; if (isAssociative(datasetLength, f)) { ++count; printFunction(datasetLength, f); } } console.log(`Found ${count} associative functions on the ${totalCount} possible`) } compute(2) ```

If we succeed in that task we might be able to extend it to other entries (outside of the backing array too).

dubzzz commented 3 years ago

@TomerAberbach

Here is a potential implementation, not perfect but it seems to work well actually. The arbitrary has signature: binaryAssociativeFunction(arb: Arbitrary<T>): (a: T, b: T) => T.

You can try it at: https://runkit.com/dubzzz/associative

const fc = require('fast-check');

// Various helpers

function next(outputForIndexes, datasetLength) {
  for (let cursor = outputForIndexes.length -1 ; cursor >= 0 ; --cursor) {
    outputForIndexes[cursor]++;
    if (outputForIndexes[cursor] < datasetLength) {
      return true;
    }
    outputForIndexes[cursor] = 0;
  }
  return false;
}
function isAssociative(datasetLength, f) {
  for (let a = 0 ; a !== datasetLength ; ++a) {
    for (let b = 0 ; b !== datasetLength ; ++b) {
      for (let c = 0 ; c !== datasetLength ; ++c) {
        if (f(f(a,b),c) !== f(a,f(b,c))) {
          return false;
        }
      }
    }
  }
  return true;
}
function buildAssociativeFunction(datasetLength, randNat) {
  const numFunOutputs = datasetLength * datasetLength;
  const outputForIndexes = [...Array(numFunOutputs)].map(() => randNat() % datasetLength);
  while (true) {
    const f = (a, b) => outputForIndexes[a * datasetLength + b];
    if (isAssociative(datasetLength, f)) {
      return f;
    }
    next(outputForIndexes, datasetLength)
  }
}

// Arbitrary

function binaryAssociativeFunction(arb) {
    return fc.tuple(
        // min=3 for tests purposes,
        // max=3 because buildAssociativeFunction takes too long
        fc.set(arb, {minLength: 3, maxLength: 3, compare: (a, b) => fc.stringify(a) === fc.stringify(b)}),
        fc.infiniteStream(fc.nat()).noBias()
    )
    .map(([dataset, randomStream]) => {
        const associativeF = buildAssociativeFunction(
            dataset.length,
            () => randomStream.next().value
        );
        const stringifiedDataset = dataset.map(d => fc.stringify(d));
        return (a, b) => {
            // compute index for a
            const stringifiedA = fc.stringify(a);
            let indexA = stringifiedDataset.indexOf(stringifiedA);
            if (indexA === -1) indexA = fc.hash(stringifiedA) % dataset.length;
            // compute index for b
            const stringifiedB = fc.stringify(b);
            let indexB = stringifiedDataset.indexOf(stringifiedB);
            if (indexB === -1) indexB = fc.hash(stringifiedB) % dataset.length;
            // compute answer
            const indexResult = associativeF(indexA, indexB);
            if (!(indexA in dataset)) throw new Error('unexpected: invalid indexA');
            if (!(indexB in dataset)) throw new Error('unexpected: invalid indexB');
            if (!(indexResult in dataset)) throw new Error('unexpected: invalid indexResult');
            return dataset[indexResult];
        };
    })
}

// Just trying with one of the generated functions

const a = 'a';
const b = 'b';
const c = 'c';
const f1 = fc.sample(binaryAssociativeFunction(fc.string()), 1)[0];
console.log('a,b -> ' + f1(a, b));
console.log('(a,b),c -> ' + f1(f1(a, b), c));
console.log('b,c -> ' + f1(b, c));
console.log('a,(b,c) -> ' + f1(a, f1(b, c)));

// Property to confirm it works

fc.assert(fc.property(
    binaryAssociativeFunction(fc.string()), fc.string(), fc.string(), fc.string(),
    (f, a, b, c) => f(f(a,b),c) === f(a,f(b,c))))

In the current implementation buildAssociativeFunction takes too long. The higher the number of elements the (lot) longer it takes.

TomerAberbach commented 3 years ago

I did a little research and it seems that there may not be a way to generate associative functions efficiently.

Perhaps we could come up with certain functions that we know are associative on integers and use fc.constantFrom to choose between them. For example, addition and multiplication modulo n are both associative: https://runkit.com/tomeraberbach/associative-copy

dubzzz commented 3 years ago

Indeed, switching to well-known associative functions might be a solution.

However, it may raise some issues as the selected associative functions may have some very specific behaviours: for instance both (a, b) => (a + b) % length and (a, b) => (a * b) % length are commutative and both have a neutral element (0 for + and 1 for *).

I reworked a bit my initial snippet. The new version is https://runkit.com/dubzzz/associative-bis and seems to be pretty efficient in terms of performances. It does not need to use a maxLength for the set. Meanwhile, my current version being based on backtracking pattern I'm still trying to find something cleaner, faster...

TomerAberbach commented 3 years ago

I get "Evaluation timed out." for the fc.assert statement. Does that happen to you?

dubzzz commented 3 years ago

No, it passed fine. But I may have been lucky 😅

dubzzz commented 1 year ago

Let's probably close it for now. I'm not sure we will make that much improvements for that one in the next releases. Might be worth re-open it if one day we find an efficient way to implement such capability.