Add Criterion benchmarks

NickCondron commented 1 year ago

I pregenerated the random data to remove some noise.

.gitattributes file makes git treat the data file as a binary file instead of text

ajmalsiddiqui commented 1 year ago

Hi! Thanks a ton for this contribution to the library. The benchmarking is quite useful, and I see (based on your other PR, #7 ) that it correctly pointed out a significant improvement.

I've left a few comments regarding mostly minor things. I'll be happy to merge the MR once discussions on those comments are resolved! :)

ajmalsiddiqui commented 1 year ago

It is worth noting that if you run the suite of benchmarks a few times back to back, Criterion reports that the performance of some functions improves pretty drastically (by as much as 24% on my machine in one case). I am not sure why this happens. Could be things related to various levels of caching and advantages conferred by memory locality.

Perhaps it is a good idea to add benchmarks that use random data generated on-the-fly?

NickCondron commented 1 year ago

It is worth noting that if you run the suite of benchmarks a few times back to back, Criterion reports that the performance of some functions improves pretty drastically (by as much as 24% on my machine in one case). I am not sure why this happens. Could be things related to various levels of caching and advantages conferred by memory locality.

Unfortunately, this is a limitation of Criterion and simple wall-time benchmarks generally. They can be affected by inconsistencies on a given machine (ie. other processes, energy-saving modes on laptops, operating system scheduling, etc). The main way to reduce variance is to run the benchmarks for longer, but in my experience this doesn't get you very far when affected by all the factors above.

The benchmarking suite Iai (by the same author) greatly improves on consistency at the cost of some other tradeoffs. See more here: https://bheisler.github.io/criterion.rs/book/iai/comparison.html

I'm willing to implement Iai benchmarks, but I thought that could be a separate step.

Perhaps it is a good idea to add benchmarks that use random data generated on-the-fly?

Yea I tried this at first, but reconsidered because in theory this could introduce more noise to the benchmark timings. I don't think it matters that much though. Also generally this Levenshtein distance is not commonly used on truly random data, but it would be a larger challenge to create realistic data sets to benchmark on.

ajmalsiddiqui commented 1 year ago

Unfortunately, this is a limitation of Criterion and simple wall-time benchmarks generally. They can be affected by inconsistencies on a given machine (ie. other processes, energy-saving modes on laptops, operating system scheduling, etc). The main way to reduce variance is to run the benchmarks for longer, but in my experience this doesn't get you very far when affected by all the factors above.

The benchmarking suite Iai (by the same author) greatly improves on consistency at the cost of some other tradeoffs. See more here: https://bheisler.github.io/criterion.rs/book/iai/comparison.html

I'm willing to implement Iai benchmarks, but I thought that could be a separate step.

Yeah, I see what you mean. And I agree that Iai benchmarks can be a separate step if need be. For now, I think we can add these Criterion benchmarks, and do a little research to see how reliable the benchmarks are (e.g. optimize the implementations a little and see the difference, or compile it with different versions of rustc, etc) and see if Iai is a better candidate. The comparison link you posted does seem to suggest that Iai is a better choice for running benchmarks in CI environments, which definitely draws me towards it. So in conclusion, it is definitely worth looking into it, but not something I'll push for this PR, which I think is already a good addition.

Yea I tried this at first, but reconsidered because in theory this could introduce more noise to the benchmark timings. I don't think it matters that much though. Also generally this Levenshtein distance is not commonly used on truly random data, but it would be a larger challenge to create realistic data sets to benchmark on.

Fair enough. The improvements we're discussing in this thread probably have a higher RoI than tweaking this, so we can drop the random on-the-fly data idea for now.

ajmalsiddiqui commented 1 year ago

@NickCondron gentle reminder about the discussions on this PR. They're minor things, and I'll be happy to merge it when they're addressed. :)

ajmalsiddiqui commented 1 year ago

Thanks for all the good work on this. Merging! :)

ajmalsiddiqui / levenshtein-diff

Add Criterion benchmarks #6