caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

Consequences of new workflow for calibrators' flags #914

Open paoloserra opened 4 years ago

paoloserra commented 4 years ago

The new workflow implemented in https://github.com/ska-sa/meerkathi/pull/892 allows users to run the pipeline without ever modifying the input .MS. Among other things, the calibrators are first split off the input .MS, then flagged. This is different from the old workflow, where the calibrators were flagged from within the input .MS. (Note that the old workflow can still be used -- more below.)

The new workflow has consequences for the calibrators' flags. It results in AOflagger running on an .MS with no target scans in between calibrators scans. Secondary calibrator scans, which were originally separated by target scans, are now processed together by AOflagger, without respecting the original scan boundaries. This results in different flags compared to when the calibrators were flagged from the input .MS respecting the scan boundaries.

Below I compare the flags obtained with the two workflows.

New workflow config file:

transform_data:
  enable: true
  label_in: ''
  label_out: 'cal'
  split_field:
    enable: true
    field: 'calibrators'
    column: data
flagging:
  enable: true
  label_in: 'cal'
  field: 'calibrators'
  load_flags:
    enable: false
  flag_autocorr:
    enable: false
  quack_flagging:
    enable: false
    quackinterval: 7
  flag_shadow:
    enable: false
  autoflag_rfi:
    enable: true
    flagger: aoflagger
    strategy: firstpass_Q.rfis

New workflow pipeline run:

...
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_before_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_before_automatic_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/autoflagger' to recipe. The container will be named 'autoflag_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_after_automatic_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagdata' to recipe. The container will be named 'flagging_summary_flagging_0_cal_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_after_flagging_0_0'
INFO - Running worker flagging_worker
...
meerkathi - 2020-04-07 12:31:38,269 CRITICAL - INFO:STIMELA-5:Running job autoflag_flagging_0_0
meerkathi - 2020-04-07 12:31:38,270 CRITICAL - INFO:STIMELA-5:STEP 3 :: autoflag_flagging_0_0:: Auto-flagging flagging pass ms=gps_cal.ms fields=J0408-6545,J0825-5010
...
INFO    Summary::getResult   field J0825-5010 flagged: 377186 total: 1.188e+06 (31.7%)
...
INFO - Finished worker flagging_worker
INFO - PIPELINER EXITS WITH RETURN CODE 0

Old workflow config. file:

transform_data:
  enable: false
flagging:
  enable: true
  label_in: ''
  field: 'calibrators'
  load_flags:
    enable: false
  flag_autocorr:
    enable: false
  quack_flagging:
    enable: false
    quackinterval: 7
  flag_shadow:
    enable: false
  autoflag_rfi:
    enable: true
    flagger: aoflagger
    strategy: firstpass_Q.rfis

Old workflow pipeline run:

...
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_before_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_before_automatic_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/autoflagger' to recipe. The container will be named 'autoflag_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_after_automatic_flagging_0_0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagdata' to recipe. The container will be named 'flagging_summary_flagging_0__0'
CRITICAL - INFO:STIMELA-5:Adding cab 'pserra_cab/casa_flagmanager' to recipe. The container will be named 'save_flags_after_flagging_0_0'
INFO - Running worker flagging_worker
...
CRITICAL - INFO:STIMELA-5:Running job autoflag_flagging_0_0
CRITICAL - INFO:STIMELA-5:STEP 3 :: autoflag_flagging_0_0:: Auto-flagging flagging pass ms=gps.ms fields=J0408-6545,J0825-5010
...
CRITICAL - 2020-04-07 10:35:24  INFO    Summary::getResult   field J0825-5010 flagged: 201404 total: 1.188e+06 (17%)

So the new workflow flags 31.7% of the secondary visibilities, compared to the 17% of the old workflow.

Below is a visual example for one particular baseline (left = new workflow; right = old workflow).

Two questions: is there a way to tell AOflagger to respect scan boundaries? Is this problem specific to AOflagger only, and Tricolour won't be affected?

KshitijT commented 4 years ago

Two questions: is there a way to tell AOflagger to respect scan boundaries? Is this problem specific to AOflagger only, and Tricolour won't be affected?

Answer 1: No way to do that I know. Also, it has been a while since I took at the gory details of the flagging worker, but should it not be flagging only the calibrators in the first round anyway? The target is flagged separately in flagging__2? We could add field specifiers in the aoflagger call or if you really want to preserve the old behaviour, include aoflagger inside transform_data with current strategies, before transforming it. Yet another option is to use new strategies which do a better job.

Answer 2: Don't really know of Tricolour is affected, @bennahugo would know best.

paoloserra commented 4 years ago

Also, it has been a while since I took at the gory details of the flagging worker, but should it not be flagging only the calibrators in the first round anyway? The target is flagged separately in flagging__2? We could add field specifiers in the aoflagger call or if you really want to preserve the old behaviour, include aoflagger inside transform_data with current strategies, before transforming it. Yet another option is to use new strategies which do a better job.

@KshitijT yes, the target is flagged separately. This issue is entirely about the flagging of the calibrators, not of the target.

The target is relevant because if you flag the secondary calibrator from an .MS which contains the target then the target separates the calibrator's scans, and AOflagger flags those scans independent of one another. On the contrary, if you flag the secondary from an .MS where there is no longer a target then AOflagger puts all calibrator's scans together before flagging them, which results in different flagging results -- as shown above.

Let me know whether that makes sense.

KshitijT commented 4 years ago

Also, it has been a while since I took at the gory details of the flagging worker, but should it not be flagging only the calibrators in the first round anyway? The target is flagged separately in flagging__2? We could add field specifiers in the aoflagger call or if you really want to preserve the old behaviour, include aoflagger inside transform_data with current strategies, before transforming it. Yet another option is to use new strategies which do a better job.

@KshitijT yes, the target is flagged separately. This issue is entirely about the flagging of the calibrators, not of the target.

The target is relevant because if you flag the secondary calibrator from an .MS which contains the target then the target separates the calibrator's scans, and AOflagger flags those scans independent of one another. On the contrary, if you flag the secondary from an .MS where there is no longer a target then AOflagger puts all calibrator's scans together before flagging them, which results in different flagging results -- as shown above.

Let me know whether that makes sense.

Yes, it makes sense, I was merely pointing out that we should be flagging only the calibrators in the first round anyway. Of course the presence of intervening scan seems to make a difference to aoflagger. From the case above, I am not sure aoflagger is actually doing a worse job - it might be that it flaggs better now, so do we want to go back to the old behaviour for flagging? In case we want to do that, we either need to change the flagging strategies or shift the aoflagger run to before mstransform run.

SpheMakh commented 4 years ago

This is definitely a bug in AOflagger. It should respect scan boundaries, or at least be able to tell that there is missing time between the scans. @sjperkins tricolour will respect scan boundaries right?

paoloserra commented 4 years ago

Yes, it makes sense, I was merely pointing out that we should be flagging only the calibrators in the first round anyway.

But I was doing that. Sorry @KshitijT, I'm missing something! :(

Of course the presence of intervening scan seems to make a difference to aoflagger. From the case above, I am not sure aoflagger is actually doing a worse job - it might be that it flaggs better now, so do we want to go back to the old behaviour for flagging? In case we want to do that, we either need to change the flagging strategies or shift the aoflagger run to before mstransform run.

Well, I'm not really sure what the right thing to do is. I'm just not happy with the fact that there is a difference. :)

sjperkins commented 4 years ago

This is definitely a bug in AOflagger. It should respect scan boundaries, or at least be able to tell that there is missing time between the scans. @sjperkins tricolour will respect scan boundaries right?

Yes, it specifically partitions on SCAN_NUMBER, FIELD_ID and DATA_DESC_ID

paoloserra commented 4 years ago

This is definitely a bug in AOflagger. It should respect scan boundaries, or at least be able to tell that there is missing time between the scans. @sjperkins tricolour will respect scan boundaries right?

Yes, it specifically partitions on SCAN_NUMBER, FIELD_ID and DATA_DESC_ID

I can confirm that. I've just run the same test as above and with tricolour there is no difference between new and old workflow.

SpheMakh commented 4 years ago

How is the quality of the flagged data from tricolour compared to AOFlagger? If the difference is not too big we should switch to tricolour. Or just replicate current flagging AOFlagger strategies in the tricolour standard and do the switch either way.

KshitijT commented 4 years ago

But I was doing that. Sorry @KshitijT, I'm missing something! :(

My bad, the cross cal does flag just calibrators. Sorry !

paoloserra commented 4 years ago

How is the quality of the flagged data from tricolour compared to AOFlagger? If the difference is not too big we should switch to tricolour. Or just replicate current flagging AOFlagger strategies in the tricolour standard and do the switch either way.

That'll take a while to establish. Anyway, on to learning about the Tricolour options in Caracal!

Athanaseus commented 10 months ago

See if this still an issue with the new AOFlagger release