ImperialCollegeLondon / pyrealm

Development of the pyrealm package, providing an integrated toolbox for modelling plant productivity, growth and demography using Python.
https://pyrealm.readthedocs.io/
MIT License
16 stars 6 forks source link

Fixing profiling issues #208

Closed davidorme closed 3 months ago

davidorme commented 3 months ago

Description

This PR is to resolve the profiling issues described in #207. It got a bit larger than expected, but there were a few interlinked issues to give clean profiling and benchmarking workflows.

I have a broader concern that this benchmarking is going to be continually throwing up issues that arise from different runner specs, but we'll just have to see.

But - that aside: does this all look sane? Does the new graph make sense? Actually, something is wrong there as the plot from a previous failed run hasn't been replaced by the most recent passing run.

@tztsai - it would be great if you could have a look at this, but I realise you're on another project.

Type of change

Key checklist

Further checks

codecov-commenter commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.17%. Comparing base (971d1a3) to head (1a99181).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #208 +/- ## ======================================== Coverage 95.17% 95.17% ======================================== Files 28 28 Lines 1701 1701 ======================================== Hits 1619 1619 Misses 82 82 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

davidorme commented 3 months ago

The actions are all now passing: https://github.com/ImperialCollegeLondon/pyrealm/actions/runs/8633260571

I think the auto-commit is triggering a new push that is stalling somehow. Those profiling auto-commits probably don't need any CI - we could add [skip ci] to the message?

davidorme commented 3 months ago

I think we’ve got issues with the benchmarking. I think it is now working as we intended but three runs of essentially the same code (only the profiling CI workflow is changing) is leading to wildly different relative run times. The two profiling graphs being updated in this commit (https://github.com/ImperialCollegeLondon/pyrealm/pull/208/commits/d6dff3497a1b3400e09b7225ac565be012e6c9b3) show the issue.

It could be that the profiling tests have too small a load - they run really fast - to give consistent behaviour or it could be that I’ve done some mad randomisation of the sort order. I don’t think that’s the case though - I manually triggered fails in testing by altering the database and the correct processes failed the benchmarking. My guess is that runner architecture is going to make this process hard to use within CI. My intuition is that we need a single benchmarking machine to run tests?

davidorme commented 3 months ago

Also note that - with the call graph copy in benchmarking job - the failed run_benchmarking.py has clobbered that line in the run section, so the call graph is not copied when benchmarking fails. It needs its own step.

MarionBWeinzierl commented 3 months ago

I would suggest that we should try going back to a bigger problem size, to make sure we exclude random noise being a major factor in the runtimes.

davidorme commented 3 months ago

I agree - I think we can simply tile the current inputs to increase the load. A couple of other things:

  1. Those plots clearly show increasing variance from the top (longest running) to the bottom (shortest running) - which is basically just more noise in the quick running processes. I think the code before only looked at calls that took over a threshold value, but I wanted to try and get a more holistic view of execution time. We could put in some kind of tuneable runtime filter? Only include the processes that account for 95% of the total runtime or similar.
  2. I think I've also switched from using the maximum previous value for a process as the target to the minimum. I don't like using the maximum but we could use another stat - maybe something to do with the range.
MarionBWeinzierl commented 3 months ago

As discussed on Slack, we might also want to reduce when this is run, i.e., only run on merge on develop or main