Closed aroffringa closed 4 years ago
I have tested this branch against aoflagger 3.0-alpha and this seems to work ok. It's to be merged once aoflagger 3 has been released.
Hi @gsleap , since you seem to be involved in Cotter, maybe you can coordinate this merge request. I've released aoflagger 3 (repository was migrated to https://gitlab.com/aroffringa/aoflagger), which is not backwards compatible. It has quite big changes, for example all flagging strategies (including for the MWA) were rewritten in Lua. I've done basic comparisons and everything seems to work.
I'd like to merge this PR when possible, but it might be good to have a bit more testing of different use-cases before rolling this out into full production mode. Flagging should be as good as it was, but might not be "pixel perfect" copy of the old strategy, i.e. samples that are near the threshold of being flagged, might now be flagged or no longer be flagged, making the flag mask a bit different from the old strategy.
Hi @aroffringa, I guess I am now involved in cotter :-) I will try to find time to look at this next week. I might ask @marcinsokolowski to help out on this too if I ask nicely. We've both been very busy of late, but we'll try to get to this soon!
Hi Andre,
I have done a little testing on this. I did my testing via docker since it made it easier to build both aoflagger and cotter. Also, just an aside, there were several compile warnings that came out of building your aoflagger3 docker image, however there were no errors, so all good.
For my testing I created flag file output for the following 3 MWA observations-
(Scroll to end to read how I did the testing)
Aoflagger3 v 2.14 was very similar with aoflagger3 flagging a bit more. Averaging a3/a2 in a randomly picked coarse channel, averaged over all timesteps and baselines resulted in a3/a2 ~ 1.001
The analysis of aoflagger3 vs 2.14 is very different for this observation. Averaging a3/a2.14 in a randomly picked coarse channel, averaged over all timesteps and baselines resulted in a3/a2 ~0.3. So this indicates that aoflagger3 flagged much less than aoflagger 2.14. See attached plot showing a3/a2 for each fine channel in one of the coarse channels. Y axis of 1 indicates they flagged the same. . In this plot you can see the edge channels and dc channel flagged the same, but the rest of the channels are very different. Then, in the second half of the band the flags from v3 versus v2.14 become more alike.
I know sun obs are not ideal for rfi testing, but I had this obs at hand. After doing the same comparison a3/a2.14 was about ~0.8. Meaning aoflagger3 flagged 20% less than aoflagger2.14.
So, I'm not sure but I would not have expected such big differences in 1231447224 and 1242546960. You may want to look into these and see what's going on?
In the meantime I'll identify and test some other cases too.
Cheers
Greg
Just a note about my stats- here is how I arrived at these numbers:
my cotter command line was: cotter -j 38 -absmem 350 -allowmissing -initflag 2 -m ${OBS_ID}.metafits -o ${OUTPUT_PATH}/${OBS_ID}_%%.mwaf *gpubox*.fits
Thanks for these extensive tests @gsleap . I suspect the differences is are not serious but I would like to test it. I don't have easily access to mwa platforms anymore though, and I think I can't download MWA sets to my server either. What's the easiest way for me to downoad those obses?
Easiest was is to register at the MWA All Sky Virtual Observatory site. Ah I just checked and you already have an account- login is "aroffringa". If needed you should be able to reset your password from the login page if you have forgotten it.
Once logged in, go to "My Jobs", "New Data Job", then click on the "Data Download Job", paste in the obs_id, then once it is completed, a zip of the raw gpubox files will be downloadable via browser or wget url - or you can use the manta_ray_client which is a python CLI to do this too.
Let me know if you have any problems!
Hi @gsleap , I've started with testing 1231447224 (downloading takes a full night). I see differences but they seem a bit different to what you report. I agree however that 1231447224 shows some bigger differences than one might expect. It is however an awkward observation: the gpubox files for 1231447224 (at least the zip that I received) are incomplete. Cotter reports:
Missing information from GPU box 5, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 17, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 18, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 19, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 5, timerange 1. Maybe you are missing an input file?
Missing information from GPU box 17, timerange 1. Maybe you are missing an input file?
Missing information from GPU box 18, timerange 1. Maybe you are missing an input file?
Missing information from GPU box 19, timerange 1. Maybe you are missing an input file?
and it actually only continues because you have specified -allowmissing, which is not the (at least not my) general advice. The newer AOFlagger can take into account that data from the correlator is incompleet, i.e. part of it missing. When calculating flagging thresholds relative to the noise, it would ignore those data, leading to a more stable, but also more strict threshold. This is reflected in the total flags:
Cotter with aoflagger2 flags 11.9% in this set. With aoflagger3 it flags 12.6%. The total percentage is thus not shockingly different, although the flags themselves are quite different. I would say that aoflagger3 behaves better in this aspect, as it is less sensitive to the missing data. When comparing the flags you probably shouldn't include data that is missing, as their flagging status is more or less random, but don't affect the outcome.
I hadn't expected this to be much of an issue or difference for the MWA because I'm expecting that most sets should have all their visibilities -- but in sets like these there is indeed some difference. I've also looked at some of the baselines to check if the flagging worked correctly. Here are comparisons for three baselines, first the aoflagger2 results than the aoflagger3 ones:
Black patches are flagged. Aoflagger3 is more sensitive for this particular observation, leading to a lot more point or line-like features outside the big black patches. As you can see this observation also leads to quite unstable flagging. This is because the lower band of the observation is for some reason much, much brighter than the rest. I don't know if this is because of wrong subband gain settings or because of some kind of smooth, broadband RFI. However, AOFlagger isn't really able to catch continuous, smooth broadband RFI -- you need different algorithms for that, such as the ones based on the total statistics that I showed in the MWA RFI paper. This is a deliberate separation: Such algorithms can easily look at the averaged visibilities instead of the high-res vis to determine such problems. AOFlagger's strength/task is to look at the high time/freq resolution data to determine sharp, faint RFI that would become invisible after averaging.
So for this observation, AOFlagger sometimes flags the broadband feature in a given baseline and sometimes it doesn't. The fact that a part of the band is missing makes this probably even worse. This instability happens both in aoflagger2 and 3, except that for which baselines they do this it varies. In short, this instability is inherent to the algorithm, but should not be an issue in normal observations.
So, given that this observation behaves "unstable", aoflagger 3 on overall is a bit more stable in general, but both flaggers have issues with this observation. However, bottom line, for further processing, scientists should still do the same things and can expect the same or slightly better data quality.
I ran cotter like this:
cotter -allowmissing -m 1231447224.metafits -use-dysco 1231*gpubox*.fits
I'll download your sun observation as well just to check if that results in the same conclusion.
Thanks @aroffringa for that detailed response. I didn't even think about the fact that observation is missing 4 gpubox channels (I think we had 4 gpuboxes offline for a short while around the time of this observation). And good point re: not including missing data when comparing flags, my approach was pretty naive and didn't take that into account! The plots you showed are pretty nice, I assume they are from the aoflagger gui? I should go and have play with that too!
Hi @gsleap . Indeed, those plots are made with rfigui
that is part of the AOFlagger software. For analysis, also the aoqplot
tool can be very handy; it displays the flagging statistics interactively and gives a quick impression of the flagging situation. For these observations it for example shows that there are tiles that are not working at all but which have not yet been flagged by the observatory. aoqplot uses the summary information that cotter writes to the measurement set; that info can also be retrieved manually e.g. with Python casacore for comparison. Those statistics also take into account missing data etc. I had written some scripts that used those statistics to e.g. find the presence of DAB, and there was a script to automatically find those non-working antennas.
I've analyzed the Sun set (pun intended) 1242546960. This is again a somewhat non-standard set (I hope), as it seems to be missing a lot of its data at the end of the observation. This is how the visibilities look before flagging: I hadn't appreciated how much missing data affects the "old" style flagging: with cotter+aoflagger2, these data are flagged quite badly. With that combi, missing data with zero visibility-values are still "considered" for flagging. This leads to the situation that aoflagger2 is triggered by the jump from non-zero to zero values, and then applies a morphological extension on the missing-data-flags, which causes too many flags: This problem is fixed in the new Cotter + AOFlagger 3: This is only one baseline, but it is the same for every baseline (A more detailed explanation between aoflagger 2/3 is a bit too much to write up here, but I intend to write it up in an article).
If I look at the total statistics, it's also nice to see that AOFlagger identifies a weak transmitter: (this is a plot from aoqplot btw :) ) The spikes between 157-158 MHz, and the weaker one between 158-159 MHz are certainly RFI. These seem not to be as clearly detected by AOFlagger 2, probably because the combination of zero+high visibility values decreases the sensitivity.
In conclusion, I think these changes can be safely merged, and would cause edge-cases such as these to have improved flagging. In the more "common" cases, the difference should be small with the original strategy.
Hi @aroffringa - that was a great explanation, thank you (and for the pun!). I agree, I think my test method was flawed and using data with quite a lot of missing data, which also explains how my naive comparison showed such large differences. It indeed looks like aoflagger3 does a great job in this situations. So I am happy with this. p.s. I bumped the version to 4.5 (the -apply/-full-apply was 4.4) - I tag them properly today.
Hi @aroffringa - actually I will hold off on tagging this version as 4.5, until the debian / ubuntu package for aoflagger3 is available. Once they are I'll do a final test of the docker build of cotter (which installs the aoflagger package) and if that works OK, then I will create the 4.5 tag and release. Does this sound ok?
(Was on holidays) -- Yes sure, you can tag a version as/when you want.
This should only be merged to master once aoflagger 3 is released (somewhere in the next month or so). This fixes #12 .