Improve test workflows - Githubissues

SNEWS2 / snewpy

A Python package for working with supernova neutrinos

https://snewpy.readthedocs.io

BSD 3-Clause "New" or "Revised" License

26 stars 19 forks source link

Improve test workflows #154

Closed Sheshuk closed 2 years ago

Sheshuk commented 2 years ago

Prevent publishing benchmark results on main (so it won't crash the test)

JostMigenda commented 2 years ago

Is there any particular reason to use if: ${{ GITHUB_EVENT_NAME != "push" }} instead of if: ${{ GITHUB_EVENT_NAME == "pull_request" }}? Doesn’t make a difference now, of course, but just in case we ever add other workflow triggers …

Sheshuk commented 2 years ago

I don't know. I'm not sure if it does include there the type as well, like "pull_request.ready_for_review", so I just tried to stay on the safe side. Please correct if you know

JostMigenda commented 2 years ago

It doesn’t; the type can be accessed separately via github.event.action, see this example. Note also that the syntax in if expressions is github.event_name instead of the environment variable GITHUB_EVENT_NAME.

github-actions[bot] commented 2 years ago

Result of Benchmark Tests

Benchmark	Min	Max	Mean
python/snewpy/test/test_snowglobes.py::test_simulation_chain_benchmark	5.96	6.24	6.05 +- 0.11

JostMigenda commented 2 years ago

Hm … so I think this is working as intended but integration tests fail because one of the changes in #132 doesn’t work under Python 3.7. I’m investigating …

JostMigenda commented 2 years ago

Yep; now that #155 is fixed this indeed works. Couple of things to note:

Instead of posting benchmark results on every run, the action will update the initial benchmark post. That’s probably a good thing (to avoid spamming the issue), but worth keeping in mind.
Benchmark times vary quite a bit between different Python versions, from 4.17s to 5.82s. A 40% performance difference seems surprising to me; I’m re-running to see how much of that is random variation (e.g. due to different hardware).

Sheshuk commented 2 years ago

Benchmark times vary quite a bit between different Python versions

This can be caused by the default number of workers used for ThreadPoolExecutor:

Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5

Changed in version 3.8: Default value of max_workers is changed to min(32, os.cpu_count() + 4). This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU

JostMigenda commented 2 years ago

Changes from first to second run: v3.7: 4.17 ➔ 4.96 v3.8: 5.64 ➔ 5.98 v3.9: 5.17 ➔ 6.05 v3.10: 5.82 ➔ 5.33

So yes, it looks like the difference is largely random variation. So this is probably good enough to catch extreme (2× or higher) performance differences, but for smaller changes, more reliable benchmarking is still required. Good to know!