RadioAstronomySoftwareGroup / pyuvsim

A ultra-high precision package for simulating radio interferometers in python on compute clusters.
https://pyuvsim.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
43 stars 7 forks source link

Sim Setup Performance Enhancements #410

Closed steven-murray closed 1 year ago

steven-murray commented 1 year ago

Description

This PR adds the following performance-enhancing features in the context of setting up a simulation from config:

  1. Ability to do partial reading of beams (if they are beamfits format), to reduce peak RAM usage. Principally, it adds the ability to specify select: in the telescope YAML file, to which can be added freq_range, az_range and za_range. For instance, if a simulator only requires the beam above the horizon, they may want to set za_range: [0, 95] (where we use 95 so that interpolation to the horizon doesn't quite hit the edge). It also adds the possibility of a select: parameter in the main obsparam file, under the telescope: heading. Here, I've only added one possible parameter to the select: heading, which is freq_buffer. If set, this will pass through its information so that only frequencies from (freq_array.min - buffer, freq_array.max + buffer) are read in (where freq_array is the UVData frequencies). It seems reasonable that only frequencies surrounding those that will actually be simulated should be read in (again, with some buffer to enable interpolation with k>1).
  2. Ability to turn off the checks on the built UVData object (especially the run_check_acceptability which can double peak RAM)
  3. Ability to not reorder the blt axis after initial setup. While pyuvsim requires a re-ordering, other simulators do not, and doing so can triple the peak RAM at setup.
  4. Manually sets the blt_order attribute of the UVData upon setup, which can help simulators know if the blt axis needs to be reordered to suit their purposes.
  5. Adds many INFO-level log-statements throughout the setup. These shouldn't general be shown, but can be switched on by the user in application code. I am open to suggestions on which of these statements are really necessary and whether you'd prefer a different log-level (or different logging mechanism entirely).

NOTE: this PR also pins pyradiosky<0.2, due to API changes in pyradiosky that break unrelated code in pyuvsim (unrelated to this PR, that is). That will be fixed by #417.

Motivation and Context

This PR was developed alongside trying to get the H4C simulations run on GPU (using hera_sim calling vis_gpu). The main concerns here were peak RAM usage and timing performance.

The RAM usage is a particular concern because there is quite a limited amount of RAM available on the host of the GPU nodes (23.75GB). The simulations we're running are reasonably large (1 freq, 17280 times, 8000 baselines, 4 pols), but should easily fit in this amount of RAM. However, there were several problems with the setup that meant that the peak RAM would exceed this amount. This PR significantly reduces this peak RAM (although, in the end I still had to run the simulations in chunks of 5760 times each).

The wall-time performance is another concern that was previously overlooked. Simulations of this size on the CPU take ~10-20 hours, so the setup/write time is negligible. However, on the GPU, the simulation takes ~15min, which makes every minute of setup time undesirable. Using the changes in this PR (and some changes in hera_sim to accompany it) I was able to reduce setup/check/write time from ~30min to mere seconds.

Types of changes

Checklist:

For all pull requests:

New feature checklist:

codecov[bot] commented 1 year ago

Codecov Report

Base: 99.13% // Head: 99.15% // Increases project coverage by +0.01% :tada:

Coverage data is based on head (c2c9c9c) compared to base (8ae4dee). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #410 +/- ## ========================================== + Coverage 99.13% 99.15% +0.01% ========================================== Files 13 13 Lines 2197 2241 +44 ========================================== + Hits 2178 2222 +44 Misses 19 19 ``` | [Impacted Files](https://codecov.io/gh/RadioAstronomySoftwareGroup/pyuvsim/pull/410?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=RadioAstronomySoftwareGroup) | Coverage Δ | | |---|---|---| | [pyuvsim/antenna.py](https://codecov.io/gh/RadioAstronomySoftwareGroup/pyuvsim/pull/410?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=RadioAstronomySoftwareGroup#diff-cHl1dnNpbS9hbnRlbm5hLnB5) | `94.54% <100.00%> (+0.10%)` | :arrow_up: | | [pyuvsim/simsetup.py](https://codecov.io/gh/RadioAstronomySoftwareGroup/pyuvsim/pull/410?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=RadioAstronomySoftwareGroup#diff-cHl1dnNpbS9zaW1zZXR1cC5weQ==) | `99.40% <100.00%> (+0.02%)` | :arrow_up: | | [pyuvsim/telescope.py](https://codecov.io/gh/RadioAstronomySoftwareGroup/pyuvsim/pull/410?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=RadioAstronomySoftwareGroup#diff-cHl1dnNpbS90ZWxlc2NvcGUucHk=) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=RadioAstronomySoftwareGroup). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=RadioAstronomySoftwareGroup)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

steven-murray commented 1 year ago

@bhazelton I think this is ready now. I'm getting that three tests fail, and I'm not sure why. It doesn't seem to have anything to do with the code that I modified (but I'm likely wrong, of course).

bhazelton commented 1 year ago

I've fixed the test issues in #409, so once that is merged we can rebase this. Note that that PR includes some new pre-commit checks, so after rebasing, you'll probably want to run pre-commit to make sure checks will pass (checks do not automatically run when you rebase like the they do when you commit). I'm happy to help with or do the rebase if you'd like.

steven-murray commented 1 year ago

@bhazelton I've done the rebase and also covered some lines that weren't covered. The things that are failing are weird new warnings from pytest, that I don't understand (nothing to do with my additions). The project coverage went down but the patch coverage is 100% so I can't do anything about that.

steven-murray commented 1 year ago

@bhazelton I think this is ready for review again -- any idea on the min deps error?

steven-murray commented 1 year ago

@bhazelton I think this is good to go again -- the warnings test is only failing on the codecov upload.