ecmwf / anemoi-inference

Apache License 2.0
14 stars 4 forks source link

remove prog fields variable #52

Closed cathalobrien closed 4 days ago

cathalobrien commented 4 days ago

Not quite a bugfix or a feature, just a little change which can save some memory

- prognostic_fields = y_pred[..., prognostic_output_mask]
...
- input_tensor_torch[:, -1, :, prognostic_input_mask] = prognostic_fields
+ input_tensor_torch[:, -1, :, prognostic_input_mask] = y_pred[..., prognostic_output_mask]

Currently we create a prognostic_fields variable and then assign it to input_tensor_torch a few lines later. Nothing happens to prognostic_fields or y_pred in-between, so i guess this variable is just for readability?

The issue is this prog fields memory is never freed, so, after the first iteration we carry around this prognostic_fields variable. At 9km resolution, this is 2.2GB of memory (~3% of an H100s memory). We can save this memory by removing the variable and just using y_pred directly.

I have annotated the memory usage of 5 steps of inference at 9km below. The relevant part is this blue anemoi_inference/src/anemoi/inference/runner/py:503 block which appears after the first step. This is the prog fields variable, and it is repeated for all subsequent steps.

Screenshot 2024-11-19 at 11 46 20

I compared the output between this version and the original and saw no major difference. But I'm Interested in feedback, it is a relatively minor memory gain but every little helps. And maybe there's future plans which require prognostic_fields to be its own variable?

FussyDuck commented 4 days ago

CLA assistant check
All committers have signed the CLA.

codecov-commenter commented 4 days ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 88.76%. Comparing base (93bd60e) to head (5ae9cea). Report is 3 commits behind head on develop.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #52 +/- ## ======================================== Coverage 88.76% 88.76% ======================================== Files 3 3 Lines 89 89 ======================================== Hits 79 79 Misses 10 10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features: