Closed matt-graham closed 5 months ago
The developer onboarding says that we currently use pyinstrument
to benchmark the scale_run
script, so I thought I'd make a quick few comparisons against ASV:
git
:
find
function function similar to git bisect
to identify where the "biggest" slowdown in a period occurred. Could be useful for resolving situations where the cron
job flags a slowdown at the end of a day, but the day includes multiple PR mergesasv
is not actively maintained nor up-to-date
The maintain-ability (?) issue jumps out as something of a red flag to me, but asv
otherwise looks to have slighly better features at the cost of needing a dedicated machine. pyinstrument
seems more flexible however; it's fairly easy to write a psuedocode GH action workflow using it right away:
- Checkout repository
- Setup conda
- Setup conda envrionment from developer/user docs
- Install pyinstrument into the evironment
- Run pyinstrument producing a HTML output (and maybe a session output so we can reload later)
- Push HTML file somewhere? Maybe to a separate branch that we an manually view the files with htmlpreview?
A couple of options (more details in this file)
asv run --profile
so we collect both benchmarking and profiling outputs, putting them somewhere. The profiling results won't be rendered in HTML or in a human-readable format, we'll need another tool for this.pyinstrument
, in a similar vein to this example. We can manually extract something like the cpu_time
to use as a rough benchmarking estimate; relying on the Azure machines to be of reasonably similar spec, and can retain the profiling HTML files and publish these ourselves somewhere. Benchmarking won't be as accurate but provides the profiling information in a much more usable way, and doesn't require a dedicated machine.pyinstrument
. Give us the best of both, but it's a heavy compute cost and still requires a dedicated machine. Also, we'd have to investigate how the two HTML deployments play together.The wgraham/asv-benchmark
and wgraham/pyinstrument-profiling-ci
branches have (locally working, still need to fix the broken tests!) implementations of both ASV and pyinstrument for the tasks above (on a 1month long simulation so the results get produced in ~2mins).
Opinions welcome: the github-pages branch of this repository is un-used so we can initially send the HTML outputs to there for viewing.
Some notes from meeting of @tamuri, @willGraham01 and myself today to discuss this issue
TLOmodel
repository or in a separate repository, using simple nested directory structure for organizing results, similar to that created by the Julia BenchmarkCI.jl
package (see example output for ParticleDA.jl
repository).
TLOmodel-outputs
/ TLOmodel-profiling
or similar, as this would avoid downside of dedicated branch approach of possibly adding to issues around already large size of repository, and be inline with longer term aims of creating a TLOmodel organization and splitting up existing repository.pyisession
file for pyinstrument)The kind of things to monitor;
psutil.disk_io_counters
might be the way to goNOTE: Even a 1-month simulation produces a pyisession
file that is ~300MB, which is well above GitHub's 100MB standard limit. We can either:
1 frame/ms
anyway).At some point, we can move the profiling repo into the TLOmodel org (https://github.com/TLOmodel).
Closing this as profiling workflow now capturing statistics and working reliably
We would like to be able to track how the the timings measured in profiling runs of the
src/scripts/profiling/scale_run.py
script changes as new pull-requests are merged in. This would help identifying when PRs lead to performance regressions and allow us to be more proactive in fixing performance bottlenecks.Ideally this should be as automated using GitHub Actions workflows. Triggering the workflow on pushes to
master
would give the most detail in terms of giving a direct measurement of the performance differences arising from a particular PR, but when lots of PRs are going in could potentially create a large backlog of profiling runs, so an alternative would be to run on a schedule (for example nightly) using thecron
event. It would probably be worth also allowing triggering either using theworkflow_dispatch
event or using the comment-triggered workflow functionality to allow manually triggering in PRs that it is thought might have a significant effect on performance before merging.Key questions to be resolved are what profiling outputs we want to track (for example at what level of granularity, using which profiling tool) and how we want to visualize the outputs. One option would be to save the profiler output as a workflow artifact. While this would be useful in allowing access to the raw profiling data, the only option for accessing workflow artifacts appears to be downloading the artifact as a compressed zip file so this is not necessarily itself that useful for visualizing the output. One option for visualizing the profiling results would be to use the GitHub Actions job summary which allows using Markdown to produce customized output showed on the job summary page. Another option would be to output the profiling results to HTML files and then deploy these to either a GitHub Pages site or potentially to a static site on Azure storage.
Potentially useful links
The airspeed velocity package allows tracking the results of benchmarks of Python packages overtime and visualizing the results as plots in a web interface. While focused on suites of benchmarks it does also have support for running single benchmarks with profiling.
htmlpreview allows directly previewing HTML files in a GitHub repository as GitHub forces them to use the "text/plain" content-type, so they cannot be interpreted