NickCrews / mismo

The SQL/Ibis powered sklearn of record linkage
https://nickcrews.github.io/mismo/
GNU Lesser General Public License v3.0
13 stars 3 forks source link

Testing: Standardized workflow and datasets for speed and performance benchmarking #19

Closed OlivierBinette closed 2 months ago

OlivierBinette commented 9 months ago

There's currently a good unit test setup for checking functionality.

However, when developing new features, how could one go about benchmarking the performance of alternatives?

This is partly backend-specific, but I think it is important to have a way to check performance, especially if there are functions relying on sklearn, numpy, or other Python packages.

A few questions to answer here would be:

I think the solution to this should be kept as simple as possible. It'd be great to have a class that I can instanciate to configure a performance comparison, to run it, and then to save a short markdown report that contains results and my system configuration.

One thing I'd use this for, specifically, is to compare the performance of sklearn metrics to Ibis implementations. Sklearn metrics are very slow in my experience and I've struggled with them.

NickCrews commented 9 months ago

I agree benchmarking is important.

I would focus on only DuckDB for now, though the framework should get designed so it can be extended to other backends. I say this because if I am working on my laptop, DuckDB is the obvious choice: Why would I use polars or pandas instead of it, they have no advantages I can see? I can see the benefit of spark, but IDK how much benefit there will be for the added complexity, so leave it out for the first iteration?

I think we also need to plan ahead for different versions of duckdb coming out. New versions will have better perf, so we need to hold the version constant. But probably we don't want to force all future benchmarks to only use an ancient version of duckdb, instead we should make it so we can plug in new duckdb into old mismo.

Perhaps https://github.com/airspeed-velocity/asv is a solution? Need to look at how they store results. Ideally in spirit of simplicity, also just as a set of json/yaml files in this repo?? Would be great to avoid an external datastore.

One requirement I would like is to make it easy to backport a benchmark to old versions of mismo. So we write a new benchmark and add it to this repo (potentially store benchmarks in another repo, but if they can all be together that would be better in my mind), but we want a way of running that new benchmark on mismo from 6 months ago.

NickCrews commented 9 months ago

I just sunk 2 hours into this, but eventually just gave up because I didn't find anything that I liked that much :(

thoughts for the next attempt:

Someone else's thoughts on this choosing process.

ASV

Pytest-benchmark

Pytest-memray

pytest-monitor

NickCrews commented 2 months ago

I have implemented several tests with pytest-benchmark. grep the codebase for "benchmark" and you'll find them. It seems to be working OK for us so far. I'm going to close this as completed, we can start a new issue to iterate on this process if we need.