ImperialCollegeLondon / pyrealm

Development of the pyrealm package, providing an integrated toolbox for modelling plant productivity, growth and demography using Python.
https://pyrealm.readthedocs.io/
MIT License
19 stars 8 forks source link

Add small dataset for profiling #149

Closed tztsai closed 8 months ago

tztsai commented 9 months ago

Description

Fixes #140, #141

Type of change

Key checklist

Further checks

davidorme commented 9 months ago

Oh hang on - wait. You're intercepting pytest --profile and setting the test collection from that. Thats linking these marks up with the https://pypi.org/project/pytest-profiling/ plugin.

tztsai commented 9 months ago

CLI Interface:

davidorme commented 9 months ago

I'm just not sure we need to tie the test collection process to the setting of the profiling option. With one mark we can do pytest -m "profiling" --profile as the most common use case to profile only the specific profiling tests, but also run pytest --profile or pytest -m "not profiling" --profile.

Most of the time, we're going to want to run pytest -m "not profiling" to simply run the unit testing to check a commit hasn't broken stuff, but we might want to profile more than just the specific profiling suite. It also occurs to me that once #143 is closed, we can also mark tests as 'regression' and maybe exclude those except on PR. But that's getting ahead of myself.

tztsai commented 9 months ago

Oh hang on - wait. You're intercepting pytest --profile and setting the test collection from that. Thats linking these marks up with the https://pypi.org/project/pytest-profiling/ plugin.

Yes, so instead of poetry run pytest -m "profiiling" --profile, we can simply run poetry run pytest --profile, and tests not marked by profiling or profiling_only will be skipped.

However, we may do a combination of both. I can remove the profiling_only marker, and run poetry run pytest -m "not profiling" to skip the tests intended for profiling. It may not be necessary to divide the tests into three groups as I am doing now.

tztsai commented 9 months ago

I'm just not sure we need to tie the test collection process to the setting of the profiling option. With one mark we can do pytest -m "profiling" --profile as the most common use, case to profile only the specific profiling tests, but also run pytest --profile or pytest -m "not profiling" --profile.

Most of the time, we're going to want to run pytest -m "not profiling" to simply run the unit testing to check a commit hasn't broken stuff, but we might want to profile more than just the specific profiling suite. It also occurs to me that once #143 is closed, we can also mark tests as 'regression' and maybe exclude those except on PR. But that's getting ahead of myself.

OK, I think it's simpler to just use the markers together with command line arguments, so I can remove the conftest.py. Accordingly, I can update .github/workflows/pyrealm_ci.yaml to run profiling when there is a push to the main branch.

codecov-commenter commented 9 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (b9a454f) 93.38% compared to head (0d6d174) 93.38%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #149 +/- ## ======================================== Coverage 93.38% 93.38% ======================================== Files 21 21 Lines 1134 1134 ======================================== Hits 1059 1059 Misses 75 75 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tztsai commented 9 months ago

In my latest update, the program can automatically run profiling on the LFS dataset when there is a push to develop. It will then automatically create a PR to the same branch adding/updating the profiling CSV report, the call graph (if it's been generated), and some benchmark plots in the profiling folder. The plots visualize performance comparisons over functions and over different times of commit.

However currently the SVG call graph cannot be generated in the GitHub Workflow environment, even though the OS is ubuntu and I think poetry has also installed all necessary dependencies...

tztsai commented 9 months ago

I have updated report.py so that it prints something to stderr when the time consumption increases over 2% in comparison to the last profiling record. The Github workflow then checks its stderr output and fails the action after submitting a PR containing the new profiling results.

davidorme commented 9 months ago

Would it be cleaner for profiling.py to return a value to an environment variable rather than monitoring sterr?

tztsai commented 9 months ago

Would it be cleaner for profiling.py to return a value to an environment variable rather than monitoring sterr?

I agree. I have updated the script accordingly.

davidorme commented 9 months ago

Would it be cleaner for profiling.py to return a value to an environment variable rather than monitoring sterr?

I agree. I have updated the script accordingly.

That wasn't quite what I meant, but I was unclear. My idea was that profile.py uses sys.exit(0) or sys.exit(1) to return a success code. Then the profiling step can capture the resulting exit code (maybe using profiling_pass=$?) and use that to control the next steps.

Practically that works in a very similar way but it is clearer in the profiling code that the profiling_pass variable is the outcome of an exit code. At the moment that steps.profiling.outcome == 'success' requires inspection of the profile.py code to understand where it comes from. It feels more canonical bash this way but this is picky!

tztsai commented 9 months ago

Would it be cleaner for profiling.py to return a value to an environment variable rather than monitoring sterr?

I agree. I have updated the script accordingly.

That wasn't quite what I meant, but I was unclear. My idea was that profile.py uses sys.exit(0) or sys.exit(1) to return a success code. Then the profiling step can capture the resulting exit code (maybe using profiling_pass=$?) and use that to control the next steps.

Practically that works in a very similar way but it is clearer in the profiling code that the profiling_pass variable is the outcome of an exit code. At the moment that steps.profiling.outcome == 'success' requires inspection of the profile.py code to understand where it comes from. It feels more canonical bash this way but this is picky!

But if the program exits with code 1, steps.profiling.outcome will not be 'success', and the workflow as is will not save the profiling reports. However I am exploring the continue-on-error setting for this.

MarionBWeinzierl commented 9 months ago

As discussed in the dev meeting today, this PR should be split up into two PRs: One deadline with #141 (and #152) and one dealing with #142 .

tztsai commented 9 months ago

As discussed in the dev meeting today, this PR should be split up into two PRs: One deadline with #141 (and #152) and one dealing with #142 .

I have moved most of my updates to the new PR #156, which is ready to be merged. I will subsequently work on this PR for modularising the profiling and updating the LFS dataset.