MWATelescope / cotter

André Offringa's cotter pre-processing pipeline
9 stars 3 forks source link

Switch to use AOFlagger 3 API #14

Closed aroffringa closed 4 years ago

aroffringa commented 4 years ago

This should only be merged to master once aoflagger 3 is released (somewhere in the next month or so). This fixes #12 .

aroffringa commented 4 years ago

I have tested this branch against aoflagger 3.0-alpha and this seems to work ok. It's to be merged once aoflagger 3 has been released.

aroffringa commented 4 years ago

Hi @gsleap , since you seem to be involved in Cotter, maybe you can coordinate this merge request. I've released aoflagger 3 (repository was migrated to https://gitlab.com/aroffringa/aoflagger), which is not backwards compatible. It has quite big changes, for example all flagging strategies (including for the MWA) were rewritten in Lua. I've done basic comparisons and everything seems to work.

I'd like to merge this PR when possible, but it might be good to have a bit more testing of different use-cases before rolling this out into full production mode. Flagging should be as good as it was, but might not be "pixel perfect" copy of the old strategy, i.e. samples that are near the threshold of being flagged, might now be flagged or no longer be flagged, making the flag mask a bit different from the old strategy.

gsleap commented 4 years ago

Hi @aroffringa, I guess I am now involved in cotter :-) I will try to find time to look at this next week. I might ask @marcinsokolowski to help out on this too if I ask nicely. We've both been very busy of late, but we'll try to get to this soon!

gsleap commented 4 years ago

Hi Andre,

I have done a little testing on this. I did my testing via docker since it made it easier to build both aoflagger and cotter. Also, just an aside, there were several compile warnings that came out of building your aoflagger3 docker image, however there were no errors, so all good.

For my testing I created flag file output for the following 3 MWA observations-

(Scroll to end to read how I did the testing)

1110103576

Aoflagger3 v 2.14 was very similar with aoflagger3 flagging a bit more. Averaging a3/a2 in a randomly picked coarse channel, averaged over all timesteps and baselines resulted in a3/a2 ~ 1.001

1231447224

The analysis of aoflagger3 vs 2.14 is very different for this observation. Averaging a3/a2.14 in a randomly picked coarse channel, averaged over all timesteps and baselines resulted in a3/a2 ~0.3. So this indicates that aoflagger3 flagged much less than aoflagger 2.14. See attached plot showing a3/a2 for each fine channel in one of the coarse channels. Y axis of 1 indicates they flagged the same. 1231447224_aoflagger3v2. In this plot you can see the edge channels and dc channel flagged the same, but the rest of the channels are very different. Then, in the second half of the band the flags from v3 versus v2.14 become more alike.

1242546960

I know sun obs are not ideal for rfi testing, but I had this obs at hand. After doing the same comparison a3/a2.14 was about ~0.8. Meaning aoflagger3 flagged 20% less than aoflagger2.14.

So, I'm not sure but I would not have expected such big differences in 1231447224 and 1242546960. You may want to look into these and see what's going on?

In the meantime I'll identify and test some other cases too.

Cheers

Greg


How I tested

Just a note about my stats- here is how I arrived at these numbers:

  1. run cotter 4.3 with aoflagger2.14 against target observation and create mwaf files.
  2. run cotter 4.3 with aoflagger3 against target observation and create mwaf files.

my cotter command line was: cotter -j 38 -absmem 350 -allowmissing -initflag 2 -m ${OBS_ID}.metafits -o ${OUTPUT_PATH}/${OBS_ID}_%%.mwaf *gpubox*.fits

  1. I adapted some python found here to read a single flag file and perform some summaries. I modified the code to average flags across all timesteps for a single coarse channel and then dump out the average per tile (all the baselines where the tile is antenna1), and another dump averaging flags across all timesteps for a single coarse channel for each fine channel. And at the end I print a final average across all timesteps for all baselines and fine channels. With the final average using aoflagger3 divided by the final average of aoflagger2.14 then if they flagged the same it should be 1.0. If <1 then aoflagger3 has flagged less than 2.14.
aroffringa commented 4 years ago

Thanks for these extensive tests @gsleap . I suspect the differences is are not serious but I would like to test it. I don't have easily access to mwa platforms anymore though, and I think I can't download MWA sets to my server either. What's the easiest way for me to downoad those obses?

gsleap commented 4 years ago

Easiest was is to register at the MWA All Sky Virtual Observatory site. Ah I just checked and you already have an account- login is "aroffringa". If needed you should be able to reset your password from the login page if you have forgotten it.

Once logged in, go to "My Jobs", "New Data Job", then click on the "Data Download Job", paste in the obs_id, then once it is completed, a zip of the raw gpubox files will be downloadable via browser or wget url - or you can use the manta_ray_client which is a python CLI to do this too.

Let me know if you have any problems!

aroffringa commented 4 years ago

Hi @gsleap , I've started with testing 1231447224 (downloading takes a full night). I see differences but they seem a bit different to what you report. I agree however that 1231447224 shows some bigger differences than one might expect. It is however an awkward observation: the gpubox files for 1231447224 (at least the zip that I received) are incomplete. Cotter reports:

Missing information from GPU box 5, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 17, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 18, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 19, timerange 0. Maybe you are missing an input file?
Missing information from GPU box 5, timerange 1. Maybe you are missing an input file?
Missing information from GPU box 17, timerange 1. Maybe you are missing an input file?
Missing information from GPU box 18, timerange 1. Maybe you are missing an input file?
Missing information from GPU box 19, timerange 1. Maybe you are missing an input file?

and it actually only continues because you have specified -allowmissing, which is not the (at least not my) general advice. The newer AOFlagger can take into account that data from the correlator is incompleet, i.e. part of it missing. When calculating flagging thresholds relative to the noise, it would ignore those data, leading to a more stable, but also more strict threshold. This is reflected in the total flags:

Cotter with aoflagger2 flags 11.9% in this set. With aoflagger3 it flags 12.6%. The total percentage is thus not shockingly different, although the flags themselves are quite different. I would say that aoflagger3 behaves better in this aspect, as it is less sensitive to the missing data. When comparing the flags you probably shouldn't include data that is missing, as their flagging status is more or less random, but don't affect the outcome.

I hadn't expected this to be much of an issue or difference for the MWA because I'm expecting that most sets should have all their visibilities -- but in sets like these there is indeed some difference. I've also looked at some of the baselines to check if the flagging worked correctly. Here are comparisons for three baselines, first the aoflagger2 results than the aoflagger3 ones: Tile011x012-aoflagger2 Tile011x012-aoflagger3 Tile025x026-aoflagger2 Tile025x026-aoflagger3 Tile087x091-aoflagger2 Tile087x091-aoflagger3

Black patches are flagged. Aoflagger3 is more sensitive for this particular observation, leading to a lot more point or line-like features outside the big black patches. As you can see this observation also leads to quite unstable flagging. This is because the lower band of the observation is for some reason much, much brighter than the rest. I don't know if this is because of wrong subband gain settings or because of some kind of smooth, broadband RFI. However, AOFlagger isn't really able to catch continuous, smooth broadband RFI -- you need different algorithms for that, such as the ones based on the total statistics that I showed in the MWA RFI paper. This is a deliberate separation: Such algorithms can easily look at the averaged visibilities instead of the high-res vis to determine such problems. AOFlagger's strength/task is to look at the high time/freq resolution data to determine sharp, faint RFI that would become invisible after averaging.

So for this observation, AOFlagger sometimes flags the broadband feature in a given baseline and sometimes it doesn't. The fact that a part of the band is missing makes this probably even worse. This instability happens both in aoflagger2 and 3, except that for which baselines they do this it varies. In short, this instability is inherent to the algorithm, but should not be an issue in normal observations.

So, given that this observation behaves "unstable", aoflagger 3 on overall is a bit more stable in general, but both flaggers have issues with this observation. However, bottom line, for further processing, scientists should still do the same things and can expect the same or slightly better data quality.

I ran cotter like this: cotter -allowmissing -m 1231447224.metafits -use-dysco 1231*gpubox*.fits

I'll download your sun observation as well just to check if that results in the same conclusion.

gsleap commented 4 years ago

Thanks @aroffringa for that detailed response. I didn't even think about the fact that observation is missing 4 gpubox channels (I think we had 4 gpuboxes offline for a short while around the time of this observation). And good point re: not including missing data when comparing flags, my approach was pretty naive and didn't take that into account! The plots you showed are pretty nice, I assume they are from the aoflagger gui? I should go and have play with that too!

aroffringa commented 4 years ago

Hi @gsleap . Indeed, those plots are made with rfigui that is part of the AOFlagger software. For analysis, also the aoqplot tool can be very handy; it displays the flagging statistics interactively and gives a quick impression of the flagging situation. For these observations it for example shows that there are tiles that are not working at all but which have not yet been flagged by the observatory. aoqplot uses the summary information that cotter writes to the measurement set; that info can also be retrieved manually e.g. with Python casacore for comparison. Those statistics also take into account missing data etc. I had written some scripts that used those statistics to e.g. find the presence of DAB, and there was a script to automatically find those non-working antennas.

I've analyzed the Sun set (pun intended) 1242546960. This is again a somewhat non-standard set (I hope), as it seems to be missing a lot of its data at the end of the observation. This is how the visibilities look before flagging: Original-tile051x052 I hadn't appreciated how much missing data affects the "old" style flagging: with cotter+aoflagger2, these data are flagged quite badly. With that combi, missing data with zero visibility-values are still "considered" for flagging. This leads to the situation that aoflagger2 is triggered by the jump from non-zero to zero values, and then applies a morphological extension on the missing-data-flags, which causes too many flags: AOFlagger2-tile051x052 This problem is fixed in the new Cotter + AOFlagger 3: AOFlagger3-tile051x052 This is only one baseline, but it is the same for every baseline (A more detailed explanation between aoflagger 2/3 is a bit too much to write up here, but I intend to write it up in an article).

If I look at the total statistics, it's also nice to see that AOFlagger identifies a weak transmitter: AOFlagger3-rfi-spectrum (this is a plot from aoqplot btw :) ) The spikes between 157-158 MHz, and the weaker one between 158-159 MHz are certainly RFI. These seem not to be as clearly detected by AOFlagger 2, probably because the combination of zero+high visibility values decreases the sensitivity.

In conclusion, I think these changes can be safely merged, and would cause edge-cases such as these to have improved flagging. In the more "common" cases, the difference should be small with the original strategy.

gsleap commented 4 years ago

Hi @aroffringa - that was a great explanation, thank you (and for the pun!). I agree, I think my test method was flawed and using data with quite a lot of missing data, which also explains how my naive comparison showed such large differences. It indeed looks like aoflagger3 does a great job in this situations. So I am happy with this. p.s. I bumped the version to 4.5 (the -apply/-full-apply was 4.4) - I tag them properly today.

gsleap commented 4 years ago

Hi @aroffringa - actually I will hold off on tagging this version as 4.5, until the debian / ubuntu package for aoflagger3 is available. Once they are I'll do a final test of the docker build of cotter (which installs the aoflagger package) and if that works OK, then I will create the 4.5 tag and release. Does this sound ok?

aroffringa commented 4 years ago

(Was on holidays) -- Yes sure, you can tag a version as/when you want.