Lately much of our benchmark data has become less useful since the measurements got really small. Our JS and LLVM backends also have fewer errors like stack overflows, allowing us to use larger arguments in general.
We should still consider when and how often we want to normalize the arguments, since it makes historical data less comparable.
I've also started writing a benchmark config for reference arguments, such that we can compare the data with the benchmarks from other languages (this is not finished right now, since it's hard to find good reference values).
Lately much of our benchmark data has become less useful since the measurements got really small. Our JS and LLVM backends also have fewer errors like stack overflows, allowing us to use larger arguments in general.
We should still consider when and how often we want to normalize the arguments, since it makes historical data less comparable.
I've also started writing a benchmark config for reference arguments, such that we can compare the data with the benchmarks from other languages (this is not finished right now, since it's hard to find good reference values).