RadioAstronomySoftwareGroup / pyuvdata

A pythonic interface for radio astronomy interferometry data (uvfits, miriad, others)
https://pyuvdata.readthedocs.io/en/latest/index.html
BSD 2-Clause "Simplified" License
82 stars 26 forks source link

performance monitoring and benchmarking #729

Open dannyjacobs opened 4 years ago

dannyjacobs commented 4 years ago

Occasionally I hear that pyuvdata is slow, though without further investigation this complaint is impossible to decouple from the size of the data being read.* However since we do not currently track execution time of our various tasks, it is possible for a change to be introduced which increases execution time. This is difficult to monitor at scale because large files and extended execution times are not easily supported within the current testing infrastructure. Here are a few possible things we could do:

  1. Monitor the execution time of all tests. This is a crude metric as we know that execution time can be affected by exogenous factors related to the underlying cloud infrastructure or install times.
  2. Monitor the execution time of specific existing tests. This would be a more precise datum than all tests which would expose things that grossly affect read time, but since the test files are small would not expose issues that scale badly with times, freqs, etc.
  3. Add tests which focus on timing but use the existing test files. This could include things like reading the file many times and averaging the read time, generating a large number of files and concat-read them, etc.
  4. Setups that require more resources. Its not clear what the break point is.

*The following is a digest of a discussion on the 3 Dec 2019 pyuvdata telecon.

bhazelton commented 4 years ago

@dannyjacobs should this be labelled as UVData related or are you worried about other objects as well?

dannyjacobs commented 4 years ago

I think just uvdata. Sorry didn’t think about labeling.

On Thu, Dec 5, 2019 at 7:53 PM Bryna Hazelton notifications@github.com wrote:

@dannyjacobs https://github.com/dannyjacobs should this be labelled as UVData related or are you worried about other objects as well?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RadioAstronomySoftwareGroup/pyuvdata/issues/729?email_source=notifications&email_token=AAAPNV75BZAXV4REHJHT3FTQXG5C5A5CNFSM4JU5Z5C2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGC3HYY#issuecomment-562410467, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPNV5OBXLRF6SNPAGEHI3QXG5C5ANCNFSM4JU5Z5CQ .

-- Sent from Gmail Mobile

bhazelton commented 4 years ago

@mkolopanis has implemented several recent speed ups, both of the code and of the tests themselves. We do get total test suite timing from the CIs, but could add the durations keyword to get the timing of the slowest n tests.

bhazelton commented 4 years ago

speeding up _key2_inds might also be related: #201

mkolopanis commented 4 years ago

Some other issues/PRs related to some recent speed ups: #800 #813 #815 #818 #825 #834 #840