How we do benchmarking - Githubissues

astropy / astropy-benchmarks

Benchmarks for the astropy project

https://spacetelescope.github.io/bench/astropy-benchmarks/

BSD 3-Clause "New" or "Revised" License

7 stars 26 forks source link

How we do benchmarking #120

Open nstarman opened 1 month ago

nstarman commented 1 month ago

Is there a reason, beyond historical, that we use ASV over pytest-benchmark ? Looking at the two tools, pytest-benchmark has 1.5x more stars and 8x greater usage (as measured by GH dependencies). Also pytest-benchmark integrates into our existing pytest framework so that this repo might pull tests directly from astropy's test suite, using a pytest Mark (e.g. pytest.mark.benchmark_only).

Cadair commented 1 month ago

As I very recently discovered the big feature asv has which pytest-benchmark does not is the ability to do memory usage benchmarking as well.

nstarman commented 1 month ago

https://github.com/bloomberg/pytest-memray is a thin wrapper connecting https://github.com/bloomberg/memray (12K stars) to pytest.

Timing tests and memory benchmark tests are often two different tests, so IMO it's fine to use 2 popular tools, one for each.

pllim commented 1 month ago

Still not clear to me whether the pytest way can benchmark over a long period of time like asv can.

nstarman commented 1 month ago

There is native capability: https://pytest-benchmark.readthedocs.io/en/latest/comparing.html, but this is where #117 also comes in to ingest that saved data from pytest-benchmark and then present that data in a more granular and explorable way. Pretty much the same way we do code coverage where pytest measure code coverage but we use a service (codecov.io) to better present and assess the data.

pllim commented 1 month ago

Given the dissatisfaction with Codecov since it was bought out, I am still unsure...

nstarman commented 1 month ago

But with common open-source reports we can switch visualization methods. For codecov the tools use the same standard and we could switch to coveralls if we want. pytest-benchmark is similar in that people are building off of that format since it integrates with pytest. We can use pytest-benchmark out of the box or augment it. I find these concerns to be in favor of using more swappable tools and stuff that hooks into standard frameworks!

Cadair commented 1 month ago

What other options for actually visualising the data exist? If there are already well maintained options I would feel a lot happier about it.

On the memory vs timing thing, while I agree the benchmarks are likely to be different, running two different pytest plugins, and (presumably?) two different visualisation / reporting tools feels like more effort? Maybe that's worth it, but I am not familiar with it all enough to know.

astrofrog commented 1 month ago

Just to put it on the table, if we wanted we could use pytest-benchmark to define/run the benchmarks, and use asv for visualization - see discussion here - basically 'just' need to convert pytest-benchmark json to asv json.

astrofrog commented 1 month ago

Interesting comment in that discussion

nstarman commented 1 month ago

Pandas also has a good discussion about asv https://github.com/pandas-dev/pandas/issues/45049

astrofrog commented 1 month ago

Looks like speed.python.org uses codespeed, not to be confused with https://codspeed.io (which apparently must be to measure the speed of fish)