Baseline performance audit

The current performance benchmarks in profile/ that are run via make profile are inaccurate, and additionally, are not comprehensive.

The existing benchmarks need auditing to ensure that the results are in accord with actual performance, with the benchmarking apparatus properly factored out.

E.g. @michaelballantyne reports:

the difference between the flat and recursive benchmarks is an artifact of your benchmarking infrastructure. In particular, check-value constructs a list that is as long as the number of iterations of the test, as take is an operation over lists rather than sequences:

(for ([i (take how-many (cycle inputs))]) (fn i))

It looks like the cost of constructing that list accounts for much of the time in the "Conditionals" benchmark, for example. With a testing setup that doesn't construct such a list, the Racket implementation is much faster than Qi. I suspect that in the recursive benchmarks the cost of the computation is higher relative to the construction of the list of inputs and therefore is more visible in the benchmarks.

In addition, benchmark coverage is sparse at the moment, and should be made more comprehensive -- e.g. probably expand the benchmarks in profile/forms.rkt to include all Qi forms (currently these include only a few such as relay and group).

drym-org / qi

Baseline performance audit #20