Closed mkolopanis closed 2 years ago
@bhazelton @jpober I mentioned this in slack but wanted to get it tagged to the PR too.
This ordering issue is bringing up something interesting too. Comparing with a reference simulation file is part of our unit tests. Which should be fine, but if you move through this output you'll notice the reference uvdata object is ordered in (time, antenna1) format. The reference is on the RHS of this comparison.
The "problem" is line with this line on the task index making blti = bl_i + time_i * Nbls # baseline is the fast axis it's assuming that the blt index is time, baseline ordering.
The task does then go lookup what the exact baseline is with this index and computes it, so I think that the output data shouldn't matter right? Just happens to be in a different order than what it thinks. I guess this is interesting because I actually cannot tell if there is an issue at the heart of this or not.
I think I'm coming to decide there is no actual issue with the simulation since we look up the baseline for each index, just the data was not in the same order it the code thought it might be in.
Something else has come up though, in order to be able to call reorder_blts
I needed to have metadata only uvdata objects on all the non rank-0 nodes. I changed the init_uvdata_from_?
functions to return metadata only objects. This does add a memory overhead for each node though now.
Merging #371 (26fc3e8) into main (1ef8cdb) will increase coverage by
0.00%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #371 +/- ##
=======================================
Coverage 99.08% 99.09%
=======================================
Files 13 13
Lines 1963 1980 +17
=======================================
+ Hits 1945 1962 +17
Misses 18 18
Impacted Files | Coverage Δ | |
---|---|---|
pyuvsim/simsetup.py | 99.88% <100.00%> (+<0.01%) |
:arrow_up: |
pyuvsim/uvsim.py | 98.53% <100.00%> (+0.07%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1ef8cdb...26fc3e8. Read the comment docs.
Ok, if the data are correct then I'm not fussed about the ordering. But I thought @steven-murray had seen instances when the data were not correct.
Oh maybe I misread it, but then if they are incorrect that's an issue :thinking: I would think they would be correct now either way since we're forcing the time, baselines ordering right?
What I found in the hera_sim wrapper was that for certain input orderings, the output was not correct. Internally, the simulation might be correct, but when you do eg. uvdata.get_data((i,j,'xx'))
it doesn't match what you know to be true for that baseline (unless the input was ordered by time).
okay so maybe I know what was happening. it was ~partially~ probably assigning data to the wrong baseline before.
Inside the task creation this happens. where tasks_shape = (Nfreqs, Ntimes, Nbls)
this is where it assumed (time, baseline) ordering.
https://github.com/RadioAstronomySoftwareGroup/pyuvsim/blob/1ef8cdbfd98fc6f49b598f75d759852407601ebf/pyuvsim/uvsim.py#L318-L338
then as it iterates over the blti
index it is assuming that bl_i
associated with the baseline is moving fastest to the next baseline, when in reality it might not be. So it would get to a different bl_i
and look up the ant_1 and ant_2 for the blti = bl_i * time_i * Ntimes
which actually might not be associated with with bl_i
because we're assuming this index moves the fastest when really it might not be.
but as I write this I'm not sure it is making sense.
I think that makes sense. I guess the question is: do we want to just force that ordering or do we want to fix the logic to be more sensible. I guess I'm worried that if a uvdata object doesn't have all the times for all the baselines this might break as well, even if we force the ordering. Maybe we could use some of the newer uvdata functionality to make this smarter? Minimally we should force the ordering and error if this assumption breaks (i.e. if Nblts != Nbls * Ntimes).
I feel like I'd prefer if we just fix up the logic in the baseline time creation to be order agnostic but still would prefer to have the input_uv
objects be metadata only (mostly because it makes me feel better inside).
I agree with both of those!
@steven-murray any thoughts?
I'm not sure I'm qualified to have thoughts here, but I do have a couple.
uvdata
object had an attribute ordering
or something which just specified which order the object was in to begin with. Then a uvdata.reorder_blts
could be short-circuited if it already had the right order, saving quite a bit of time.Not sure if they are useful opinions, but there you go.
So it turns out there is an attribute, blt_order
that records this, but it's optional and only set if the pyuvdata method is called. So you can't count on it being there unless you call the method yourself.
@bhazelton that makes sense, because someone could just have data in a file in any old order and read it in. But, we could ensure that when a file is created with the pyuvsim simsetup, it sets that attribute. At least then users have a way of doing a faster simulation if they follow the guidelines.
right as it stands I don't even think pyuvsim is generating uvdata objects in the "optimal" configuration. I think it is doing (time, ant1) not (time, baseline) but we should be able to verify/change that. And like bryna and I would like make the simulation agnostic to the input blt ordering anyway
we can just solve this by forcing blt_order to be required :tada: but those pyuvdata devs are so hard to get in contact with
I guess what I'm saying is that making the simulation itself agnostic to the ordering may actually incur non-negligible (though not alarming) slow downs. It's usually possible to be faster when you know the ordering of your data and can iterate through it in the optimal way, rather than having to determine which indices you need to place the simulated data in. Thus I would advocate for the simulator wrapper to force the ordering before going into the detailed simulator. Then that forced ordering could be skipped if it was already the right order.
This is basically what I have implemented in the current scheme. :thinking: we could allow arbitrary ordering and warn that a slow down may occur if not (time, baseline). You know me, can never just agree with you.
also i think these lines confirm we use (time, ant1) ordering. We make (i,j) pairs where j is faster moving, then tile Ntimes times, so time moves slower than ant1. https://github.com/RadioAstronomySoftwareGroup/pyuvsim/blob/1ef8cdbfd98fc6f49b598f75d759852407601ebf/pyuvsim/simsetup.py#L1385-L1393
that looks right, so just also need to set blt_ordering
when you create the object as well, so you know it's that ordering later. Then pyuvsim should have something like
if not getattr(input_uv, 'blt_order', None) == (bl, time):
original_order = get_order(input_uv)
input_uv.reorder_blts('baseline', 'time') # or however you do this
output_uv = simulate(input_uv)
if original_order != (bl, time):
output_uv.reorder_blts(original_order)
I guess the thing is that we don't have a function like get_order()
, right?
Yeah your latest suggestion is how the code behaves right now.
Then I accept.
okay, I'm digging into something else here now so let's not push this through yet.
Okay I think it is ready for a look now. I do have a question now though. In this implementation, tasks are unraveled as (nblts, Nfreqs) so for each blt tasks will iterate over all the frequencies. Is this what we want or should it go (nfreqs, nblts) so for each frequency we iterate over blts?
I suppose it would be worthwhile to have a test now where nblts != nbls * ntimes. nominally it should be able to handle it.
@aelanman do you have a sense for whether it matters which changes fastest, the frequency axis or the blt axis?
@bhazelton Ideally, the time axis should be the slowest. Within the task loop, the local coherency, Jones matrices, and source positions are updated whenever the time changes. The local coherency and source positions are reused otherwise. If time becomes the fastest axis, this advantage is lost. (See the function UVEngine.set_task
)
Yeah that makes sense. So unraveling as (nblts, nfreqs) would still allow time to be the slowest for us since this implementation forces (time, baseline) ordering. It basically switches the frequency and baseline axis for an array where nblts = ntimes * nbls, but also allows for the non-square case.
Maybe? You would get Nbtlf float64s for time and freq, and Nbltf int64 for baseline and order. So 256 x Nbltf bits of information. I think that is 1GiB of ram per 32768 tasks. Right would have to switch them in the sort. But the idea is the same.
On Tue, Nov 2, 2021, 09:08 Adam Lanman @.***> wrote:
@.**** commented on this pull request.
In pyuvsim/uvsim.py https://github.com/RadioAstronomySoftwareGroup/pyuvsim/pull/371#discussion_r741248509 :
- tasks_shape = (Ntimes, Nfreqs, Nbls)
- time_ax, freq_ax, bl_ax = range(3)
- tasks_shape = (Nblts, Nfreqs)
There's potentially a lot of memory overhead in the first few lines too, right? Also, the baseline axis should be fastest, not freq.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RadioAstronomySoftwareGroup/pyuvsim/pull/371#discussion_r741248509, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACB7BLOVWIFFEXXP2MIO2YLUKALHTANCNFSM5GVW6RWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
If it's happening in the uvdata_to_task_iter
function, those arrays will be generated on each MPI process.
Yeah I understand this.
On Tue, Nov 2, 2021, 10:50 Adam Lanman @.***> wrote:
If it's happening in the uvdata_to_task_iter function, those arrays will be generated on each MPI process.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RadioAstronomySoftwareGroup/pyuvsim/pull/371#issuecomment-957987572, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACB7BLPA6N33DWXPV437OCTUKAXF7ANCNFSM5GVW6RWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
So wouldn't that be 1 GiB Npus_per_node of memory used on each node? Once it downselects, it'll be less, but up front it will make an array of the size Nblts Nfreqs on each process.
Yes, in the current prototype implementation that will happen. There may be more sly ways to do it too. I was interested to see how the idea would be received but it seems like we've begun focusing on details instead. I'll post something else once I've found a well optimized solution.
On Tue, Nov 2, 2021, 11:45 Adam Lanman @.***> wrote:
So wouldn't that be 1 GiB Npus_per_node of memory used on each node? Once it downselects, it'll be less, but up front it will make an array of the size Nblts Nfreqs on each process.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RadioAstronomySoftwareGroup/pyuvsim/pull/371#issuecomment-958040256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACB7BLLY3SPGG2RSQQIKRLDUKA5WPANCNFSM5GVW6RWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Looks like I was off by an order of magnitude. it is 1MiB per 30k tasks. had a factor of 10 problem.
Your Ntasks
is for 32 baselines?
I'm not sure I understand the question. I had divided 2^20 by my estimation of the size per task instead of 2^30.
I don't understand where "32768" tasks came from earlier. If that's Nblts * Nfreqs, then for 1024 freqs that's Nblts = 32.
This is what I did. tasks to make 1 GiB of ram = 1 GiB / ( number of arrays * element size of array)
or what I plugged into a python terminal for 4 arrays with 64 bit elements.
tasks_for_1GIB = 2 ** 30 / ((4 * 64) / 8)
and the problem I had the first time around was i used 2 ** 20
in the numerator, so I reported the number of tasks to make 1 MiB ( the ~30k number)
I've pushed up a new version with views, broadcasting, iterators, and slices where possible. I've empirically seen a 1e-5 to 3e-5 MiB increase ram usage per local_tasks per PU.
Excellent! This looks great. Thanks for figuring that out!
@steven-murray Do you understand the herasim test failures on this PR? Also, I think it's getting close to completion if you want to have another look.
@bhazelton I don't think the breaking tests have anything to do with pyuvsim per se, and they will likely be fixed soon with a major PR in hera_sim. I think you can merge without those tests passing.
Here's an update on the benchmark from main and from the current version of this branch. Should I just add it to the base file even though it is untracked? |
Date/Time | uvsim_version | HostName | SettingsFile | cpus-per-task | Ntasks | Nnodes | MemLimit | Ntimes | Nbls | Nfreqs | Nsrcs | Nsrcs_part | MaxRSS [GiB] | Runtime |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-03-06-21:40:47.584612 | 1.1.2. | node474.oscar.ccv.brown.edu | settings.yaml | 2 | 10 | 1 | 50G | 21 | 500 | 256 | 3072 | 1 | 13.071 | 4:32:16.373965 | |
2020-03-26-17:54:43.959585 | 1.1.2. | node474.oscar.ccv.brown.edu | settings.yaml | 2 | 10 | 1 | 50G | 21 | 500 | 256 | 3072 | 1 | 12.187 | 3:24:00.635442 | |
2020-04-02-15:50:16.314578 | 1.1.2. | node473.oscar.ccv.brown.edu | settings.yaml | 2 | 10 | 1 | 50G | 21 | 500 | 256 | 3072 | 1 | 11.475 | 0:10:30.647003 | |
2021-11-08-13:29:18.779374 | 1.2.2.dev1+g23f68a4.main | herapost-master.aoc.nrao.edu | settings.yaml | 2 | 10 | 1 | 50G | 21 | 500 | 256 | 3072 | 1 | 2.053 | 0:11:56.878661 | |
2021-11-08-11:38:08.304418 | 1.2.2.dev25+gef5c800.fix_uvdata_ordering | herapost-master.aoc.nrao.edu | settings.yaml | 2 | 10 | 1 | 50G | 21 | 500 | 256 | 3072 | 1 | 2.061 | 0:11:46.983807 |
I'm also running reference sim 2.1 on the NRAO cluster to check timings. I'll post an update when I have one.
with an N of 1, reference sim 2.1 execution time on the NRAO cluster differed by 6 minutes (longer on this branch) out of 516 minutes (8.6 hours). I don't think that's really an appreciable difference.
Really, that's better. Gives you time to make coffee.
Forces time-baseline ordering before a simulation.
Motivation and Context
time-baseline ordering was assumed in the task index creation, this way we can enforce it. Attempts to re-order back if the ordering was well defined before the simulation. closes #370
Types of changes
Checklist:
For all pull requests:
Bug fix checklist: