Open bdemann opened 1 day ago
The further I get into this project the less helpful I think it is. Here are some things that I have learned
since this is an average of all of the methods called among all of our tests in addition to measuring the efficiency of azle it also measures how many instructions our tests run. As a result a comparison between versions is only helpful if it is running on the exact same methods the exact same number of times. If you introduce another method that takes a lot of instructions we are not going to be measuring an decrease in azle efficiency, we are going to be measuring an increase in azle tests.
This is particularly problematic for tests that are arbitrarily large. It very quickly overwhelms our average and make it impossible to use it any any sort of weighted global measurement without blasting away any other component of that measurement or diminishing the average to the point that it might as well not be part of the calculation.
I think it would be better if we want to get a single set of data the represents how efficient azle is to have a benchmark example that has a good representation of what we think the average usage would be and then use that exact same example for comparisons between azle versions.
write a script that after all the benchmarks run take a look at all of the benchmarks and average and stuff