caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

Adding 'BPOLY' option to bpcal #1419

Open rubyvanrooyen opened 2 years ago

rubyvanrooyen commented 2 years ago

Update CARACal to enable BPOLY option in the crosscal worker. Initial development will be done on a fork of the repo and once validated we need to discuss where and how to add the updates to the pipeline

rubyvanrooyen commented 2 years ago
diff --git a/caracal/schema/crosscal_schema.yml b/caracal/schema/crosscal_schema.yml
index f47f1bda..b0bc50d7 100644
--- a/caracal/schema/crosscal_schema.yml
+++ b/caracal/schema/crosscal_schema.yml
@@ -171,6 +171,31 @@ mapping:
             example: ""
             required: false
             type: str
+          degamp:
+            desc: Polynomial degree for BPOLY amplitude solution
+            type: int
+            required: false
+            example: "3"
+          degphase:
+            desc: Polynomial degree for BPOLY phase solution
+            type: int
+            required: false
+            example: "3"
+          visnorm:
+            desc: Normalize data prior to BPOLY solution
+            type: bool
+            required: false
+            example: "False"
+          maskcenter:
+            desc: Number of channels in BPOLY to avoid in center of band
+            type: int
+            required: false
+            example: "0"
+          maskedge:
+            desc: Percent of channels in BPOLY to avoid at each band edge
+            type: int
+            required: false
+            example: "0"
           plotgains:
             desc: Plot gains with ragavi-gains. The .html plots are located in <output>/diagnostic_plots/crosscal/.
             type: bool
rubyvanrooyen commented 2 years ago
diff --git a/caracal/workers/crosscal_worker.py b/caracal/workers/crosscal_worker.py
index c0c60d12..0bb467bd 100644
--- a/caracal/workers/crosscal_worker.py
+++ b/caracal/workers/crosscal_worker.py
@@ -80,6 +80,12 @@ RULES = {
         "cab": "cab/casa_bandpass",
         "field": "bpcal",
     },
+    "P": {
+        "name": "bpoly_cal",
+        "interp": "linear",
+        "cab": "cab/casa_bandpass",
+        "field": "bpcal",
+    },
     "A": {
         "name": "auto_flagging",
         "cab": "cab/casa_flagdata",
@@ -184,6 +190,21 @@ def solve(msname, msinfo, recipe, config, pipeline, iobs, prefix, label, ftype,
             params["fillgaps"] = config[ftype]["b_fillgaps"]
             params["uvrange"] = config["uvrange"]
             params["scan"] = config[ftype]["scanselection"]
+        elif term == "P":
+            # gain plots break for new table format, need to be set to FALSE
+            if config[ftype]["plotgains"]:
+                raise RuntimeError("Ragavi gain plots not yet supported for BPOLY cal, ",
+                                   "set plotgains to false")
+            params["bandtype"] = 'BPOLY'
+            params["solnorm"] = config[ftype]["b_solnorm"]
+            params["fillgaps"] = config[ftype]["b_fillgaps"]
+            params["uvrange"] = config["uvrange"]
+            params["scan"] = config[ftype]["scanselection"]
+            params["degamp"] = config[ftype]["degamp"]
+            params["degphase"] = config[ftype]["degphase"]
+            params["visnorm"] = config[ftype]["visnorm"]
+            params["maskcenter"] = config[ftype]["maskcenter"]
+            params["maskedge"] = config[ftype]["maskedge"]
         elif term == "K":
             params["gaintype"] = term
             params["scan"] = config[ftype]["scanselection"]
@@ -244,7 +265,6 @@ def solve(msname, msinfo, recipe, config, pipeline, iobs, prefix, label, ftype,
                     output=pipeline.output,
                     label='smooth bandpass')

-
         # Assume gains were plotted when they were created
         if config[ftype]["plotgains"] and not can_reuse:
             plotgains(recipe, pipeline, field_id if term != "F" else None, caltable, iobs, term=term)
@@ -328,7 +348,7 @@ def solve(msname, msinfo, recipe, config, pipeline, iobs, prefix, label, ftype,
     # terms that need an apply
     groups_apply = list(filter(lambda g: g, re.findall("([AI]+)?", order)))
     # terms that need a solve
-    groups_solve = list(filter(lambda g: g, re.findall("([KGBF]+)?", order)))
+    groups_solve = list(filter(lambda g: g, re.findall("([KGBPF]+)?", order)))
     # Order has to start with solve group.
     # TODO(sphe) in the philosophy of giving user enough roap to hang themselves
     # Release II will allow both starting with I/A in case
rubyvanrooyen commented 2 years ago

@SpheMakh, @o-smirnov the basics are in the workers and in essence the CASA command will now apply the 'P' solution. Cleaning up some runtime errors

rubyvanrooyen commented 2 years ago

To apply the added 'P' rule to the primary calibrator worker simply change the rime from 'KGBAKGB' to 'KGPAKGP', which will now create bandpass calibration tables with a .P? extension rather than a .B? extension e.g. J1939-1625501775_sdp_l0-1gc_primary.P0 and J1939-1625501775_sdp_l0-1gc_primary.P1

rubyvanrooyen commented 2 years ago

The biggest issue is that the BPOLY calibration table uses a different table layout than the other calibration tables. This cause a couple of things to fail. For the moment, these are addressed symptomatically and code added to "hack" out the issues and get the pipeline to run.

Expected calibration table schema: image

BPOLY calibration table schema: image

rubyvanrooyen commented 2 years ago

This cause a Stimela exception

# Running CASA task 'bandpass'
# Successful readonly open of default-locked table /stimela_mount/output/G330-1625501775_sdp_l0-1gc_primary.B0: 49 columns, 61 rows
# Traceback (most recent call last):
#   File "/stimela_mount/code/run.py", line 23, in <module>
#     tab = table(gtab+"::FIELD")
#   File "/usr/local/lib/python3.6/dist-packages/casacore/tables/table.py", line 373, in __init__
#     Table.__init__(self, tabname, lockopt, opt)
# RuntimeError: Table /stimela_mount/output/G330-1625501775_sdp_l0-1gc_primary.B0::FIELD does not exist (subtable FIELD is unknown)

Fundamentally this can be addressed through a try/except block around the test, but it appears that only the BPOLY calibration table generated only populated the "FIELD_ID" column, with the "FIELD_NAME" column being empty. Thus the test is not relevant for this calibration table and skipped

diff --git a/stimela/cargo/cab/casa_bandpass/src/run.py b/stimela/cargo/cab/casa_bandpass/src/run.py
index d5f6016e..0eba322a 100644
--- a/stimela/cargo/cab/casa_bandpass/src/run.py
+++ b/stimela/cargo/cab/casa_bandpass/src/run.py
@@ -20,16 +20,25 @@ tab = table(gtab)
 field_ids = numpy.unique(tab.getcol("FIELD_ID"))
 tab.close()

-tab = table(gtab+"::FIELD")
-field_names = tab.getcol("NAME")
-tab.close()
-
 field_in = parameters_dict["field"].split(",")
-
 try:
-    ids = list(map(int, field_in))
-except ValueError:
-    ids = list(map(lambda a: field_names.index(a), field_in))
+    tab = table(gtab+"::FIELD")
+    field_names = tab.getcol("NAME")
+    tab.close()
+except RuntimeError:
+    # possible new table format
+    # sadly Field name and Source name columns are empty
+    # will need to figure this out, but ignoring the tests for now
+    tab = table(gtab)
+    field_names = numpy.unique(tab.getcol("FIELD_NAME"))
+    tab.close()
+    pass
+
+if field_names:
+    try:
+        ids = list(map(int, field_in))
+    except ValueError:
+        ids = list(map(lambda a: field_names.index(a), field_in))
+    if not set(ids).issubset(field_ids):
+        raise RuntimeError(f"Some field(s) do not have solutions after the calibration. Please refer to CASA {config.binary} logfile for further details")

-if not set(ids).issubset(field_ids):
-    raise RuntimeError(f"Some field(s) do not have solutions after the calibration. Please refer to CASA {config.binary} logfile for further details")
rubyvanrooyen commented 2 years ago

Another casualty of the differing calibration table schemas is the convenient plotgains option since this also causes Ragavi to fail. Thus the primary calibrator worker for 'P' currently requires plotgains: false

SpheMakh commented 2 years ago

@Mulan-94 this breaks the ragavi-gains plots, so could you please have a look at the new gain table format and make the required changes to ragavi.

rubyvanrooyen commented 2 years ago

ragavi gains plot error

# 2022-05-11 18:29:04: Running ragavi-gains --table /stimela_mount/msdir/J1939-1625501775_sdp_l0-1gc_primary.P0 --corr  --cmap coolwarm --field 0 --htmlname /stimela_mount/output/J1939-1625501775_sdp_l0-1gc_primary.P0 --plotname /stimela_mount/output/J1939-1625501775_sdp_l0-1gc_primary.P0.png
# 11.05.2022@18:29:09 - ragavi               - ERROR      - Oops ... uncaught exception occurred!
# 11.05.2022@18:29:09 - ragavi               - ERROR      - Traceback (most recent call last):
# 11.05.2022@18:29:09 - ragavi               - ERROR      -    File "/usr/local/bin/ragavi-gains", line 19, in <module>
# 11.05.2022@18:29:09 - ragavi               - ERROR      -     main(options=options)
# 11.05.2022@18:29:09 - ragavi               - ERROR      -    File "/usr/local/lib/python3.6/dist-packages/ragavi/ragavi.py", line 1703, in main
# 11.05.2022@18:29:09 - ragavi               - ERROR      -     freqs = vu.get_frequencies(tab).values / _GHZ_
# 11.05.2022@18:29:09 - ragavi               - ERROR      -    File "/usr/local/lib/python3.6/dist-packages/ragavi/utils.py", line 274, in get_frequencies
# 11.05.2022@18:29:09 - ragavi               - ERROR      -     "REF_FREQUENCY", "RESOLUTION"])
# 11.05.2022@18:29:09 - ragavi               - ERROR      -    File "/usr/local/lib/python3.6/dist-packages/daskms/dask_ms.py", line 326, in xds_from_table
# 11.05.2022@18:29:09 - ragavi               - ERROR      -     **kwargs).datasets()
# 11.05.2022@18:29:09 - ragavi               - ERROR      -    File "/usr/local/lib/python3.6/dist-packages/daskms/reads.py", line 281, in __init__
# 11.05.2022@18:29:09 - ragavi               - ERROR      -     raise ValueError("'%s' does not appear to be a CASA Table" % table)
# 11.05.2022@18:29:09 - ragavi               - ERROR      -  ValueError: '/stimela_mount/msdir/J1939-1625501775_sdp_l0-1gc_primary.P0::SPECTRAL_WINDOW' does not appear to be a CASA Table
# 2022-05-11 18:29:10: ragavi-gains exited with code 1
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR: docker returns error code 1
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR: job failed at 2022-05-11 18:29:10.501204 after 0:00:08.700572
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR: Traceback (most recent call last):
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:   File "/scratch/ruby/data_processing/bin/Stimela/stimela/recipe.py", line 713, in run
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:     job.run_job()
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:   File "/scratch/ruby/data_processing/bin/Stimela/stimela/recipe.py", line 430, in run_job
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:     self.job.start(output_wrangler=self.apply_output_wranglers)
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:   File "/scratch/ruby/data_processing/bin/Stimela/stimela/docker.py", line 167, in start
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:     kill_callback=lambda: utils.xrun("docker", ["kill", self.name]))
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:   File "/scratch/ruby/data_processing/bin/Stimela/stimela/utils/xrun_poll.py", line 227, in xrun
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR:     raise StimelaCabRuntimeError("{} returns error code {}".format(command_name, status))
2022-05-11 18:29:10 CARACal.Stimela.plotgains-P-0-0 ERROR: stimela.utils.StimelaCabRuntimeError: docker returns error code 1
2022-05-11 18:29:10 CARACal.Stimela.crosscal__bpoly INFO: Completed jobs : ['version-J1939_crosscal__bpoly_before-ms0', 'set_model_cal-0', 'antenna_flag_summary', 'antenna_flag_summary', 'antenna_flag_summary', 'bpoly_cal-1gc-0-0-primary']
2022-05-11 18:29:10 CARACal.Stimela.crosscal__bpoly INFO: Remaining jobs : ['apply_gains-fcal--0', 'auto_flagging-1gc-0-0-primary', 'antenna_flag_summary']
2022-05-11 18:29:10 CARACal.Stimela.crosscal__bpoly INFO: Logging remaining task: apply_gains-fcal--0::Apply gain tables
2022-05-11 18:29:10 CARACal.Stimela.crosscal__bpoly INFO: Logging remaining task: auto_flagging-1gc-0-0-primary::
2022-05-11 18:29:10 CARACal.Stimela.crosscal__bpoly INFO: Logging remaining task: antenna_flag_summary:: Flagging summary  ms=1625501775_sdp_l0-cal.ms
2022-05-11 18:29:10 CARACal.Stimela.crosscal__bpoly INFO: Saving pipeline information in .last_crosscal__bpoly.json
2022-05-11 18:29:10 CARACal ERROR: Job 'plotgains-P-0-0:: Plot gaincal phase' failed: docker returns error code 1 [PipelineException]
2022-05-11 18:29:10 CARACal INFO:   More information can be found in the logfile at output-fcal-4k-bpoly/logs-20220511-174339/log-caracal.txt
2022-05-11 18:29:10 CARACal INFO:   You are running version 1.0.5-24-ga60628ba
2022-05-11 18:29:10 CARACal ERROR: Traceback (most recent call last):
2022-05-11 18:29:10 CARACal ERROR:   File "/scratch/ruby/data_processing/bin/caracal/caracal/main.py", line 187, in __run
2022-05-11 18:29:10 CARACal ERROR:     pipeline.run_workers()
2022-05-11 18:29:10 CARACal ERROR:   File "/scratch/ruby/data_processing/bin/caracal/caracal/workers/worker_administrator.py", line 441, in run_workers
2022-05-11 18:29:10 CARACal ERROR:     worker.worker(self, recipe, config)
2022-05-11 18:29:10 CARACal ERROR:   File "/scratch/ruby/data_processing/bin/caracal/caracal/workers/crosscal_worker.py", line 680, in worker
2022-05-11 18:29:10 CARACal ERROR:     prefix_msbase, label=label, ftype="primary")
2022-05-11 18:29:10 CARACal ERROR:   File "/scratch/ruby/data_processing/bin/caracal/caracal/workers/crosscal_worker.py", line 394, in solve
2022-05-11 18:29:10 CARACal ERROR:     do_KGBF(i)
2022-05-11 18:29:10 CARACal ERROR:   File "/scratch/ruby/data_processing/bin/caracal/caracal/workers/crosscal_worker.py", line 163, in do_KGBF
2022-05-11 18:29:10 CARACal ERROR:     pipeline.maxdist[iobs], i)
2022-05-11 18:29:10 CARACal ERROR:   File "/scratch/ruby/data_processing/bin/caracal/caracal/workers/utils/manage_antennas.py", line 17, in get_refant
2022-05-11 18:29:10 CARACal ERROR:     recipe.run()
2022-05-11 18:29:10 CARACal ERROR:   File "/scratch/ruby/data_processing/bin/Stimela/stimela/recipe.py", line 764, in run
2022-05-11 18:29:10 CARACal ERROR:     raise PipelineException(exc, self.completed, job, self.remaining) from None
2022-05-11 18:29:10 CARACal ERROR: stimela.exceptions.PipelineException: Job 'plotgains-P-0-0:: Plot gaincal phase' failed: docker returns error code 1
2022-05-11 18:29:10 CARACal INFO: exiting with error code 1
rubyvanrooyen commented 2 years ago

@SpheMakh @Mulan-94 I did a test run with some WB and NB data, and it seems that even casa has issues plotting the cal table results. Hold off a little with the investigation to add the bcal plots with bpoly to ragavi I just want to see what is the error and if it would be worth while

As I understand it current -- when I say plotgains: false in the primary worker for crosscal, then no plots are generated.

How difficult will it be to allow plotgains: true, but only skip the bpcal table if the 'P' character is specified in the order?

rubyvanrooyen commented 2 years ago

How to get gain plots with BPOLY calibration tables:

There seem to be a reference requirement for the BPOLY cal table. It keeps the polynomial fit only and needs to refer back to the input MS used to cal in order to calculate the fit for display. The issue is overcome by adding a symbolic link to the MS in the caltables output dir.

Need to add a line to the crosscal worker to make the symlink to the parent directory of the bandpass table.

rubyvanrooyen commented 2 years ago

CASA mstransform does not currently support BPOLY calibration tables.

Do manual applycal before running transform worker on target and disable otf option

rubyvanrooyen commented 2 years ago

After inspection the display issues could be fixed by adding a symbolic link to the MS file in the caltables directory.

diff --git a/caracal/workers/crosscal_worker.py b/caracal/workers/crosscal_worker.py
index 0bb467bd..44e23404 100644
--- a/caracal/workers/crosscal_worker.py
+++ b/caracal/workers/crosscal_worker.py
@@ -191,10 +191,15 @@ def solve(msname, msinfo, recipe, config, pipeline, iobs, prefix, label, ftype,
             params["uvrange"] = config["uvrange"]
             params["scan"] = config[ftype]["scanselection"]
         elif term == "P":
-            # gain plots break for new table format, need to be set to FALSE
+#             # gain plots break for new table format, need to be set to FALSE
+#                 raise RuntimeError("Ragavi gain plots not yet supported for BPOLY cal, ",
+#                                    "set plotgains to false")
             if config[ftype]["plotgains"]:
-                raise RuntimeError("Ragavi gain plots not yet supported for BPOLY cal, ",
-                                   "set plotgains to false")
+                # adding a symlink to caltables for bpoly reference
+                src_ = os.path.abspath(os.path.join(pipeline.msdir, msname))
+                dst_ = os.path.join(pipeline.caltables, msname)
+                if not os.path.exists(dst_):
+                    os.symlink(src_, dst_)
             params["bandtype"] = 'BPOLY'
             params["solnorm"] = config[ftype]["b_solnorm"]
             params["fillgaps"] = config[ftype]["b_fillgaps"]

This allows CASA plots to be generated plotms(vis='J1939-1625501782_sdp_l0-1gc_primary.P1', xaxis='freq', yaxis='phase', field='J1939-6342', coloraxis='baseline')

We should discuss how Ragavi generates gaincal plots to overcome the table structure differences that will allow us to set plotgains: true

rubyvanrooyen commented 2 years ago

Investigate possible options for adding BPOLY gaintable displays to Ragavi

The new table structure will have a ripple affect down the line of software packages and underlying libraries. The major differences are summarised in the comments below.

At the moment it appears that even in the CASA implementation not all functionality supports this structure (such as mstransform task limitation). I am questioning the viability of adding exceptions to deal with the new structure since it still seem to be under development?

Since there is a manual solution to plot the BPOLY gain solutions from CASA, I changed the crosscal_worker code to add a symlink for the user, but then in the plotgains function I ignore that table for display. This way you get at least the other standard cal table displays and only need to generate the bp cal display.

Obviously this is suboptimal because it will not be in the convenient notebooks.

rubyvanrooyen commented 2 years ago

More changes to crosscal_worker to allow applycal of target at the end for the crosscal worker to side step the fact that mstransform does not currently support the BPOLY table structure.

Inspection showed that when you have a BPOLY caltable, you have to specify the gainfields explicitly. So for if applycal for target, remove the nearest default and explicitly insert the gainfields.

diff --git a/caracal/workers/crosscal_worker.py b/caracal/workers/crosscal_worker.py
index 25568830..8abff289 100644
--- a/caracal/workers/crosscal_worker.py
+++ b/caracal/workers/crosscal_worker.py
@@ -653,8 +653,6 @@ def worker(pipeline, recipe, config):
                output=pipeline.output,
                label='{0:s}:: Set jansky ms={1:s}'.format(step, msname))

-
-
         gcal_set = set(pipeline.gcal[i])
         fcal_set = set(pipeline.fcal[i])
         calmode = config["apply_cal"]["calmode"]
@@ -677,8 +675,10 @@ def worker(pipeline, recipe, config):
                 applycal(primary_order, msname, recipe, copy.deepcopy(gaintables), copy.deepcopy(interps),
                          "nearest", "xcal", pipeline, i, calmode=calmode, label=label)
             if "target" in config["apply_cal"]["applyto"]:
+#                 applycal(primary_order, msname, recipe, copy.deepcopy(gaintables), copy.deepcopy(interps),
+#                          "nearest", "target", pipeline, i, calmode=calmode, label=label)
                 applycal(primary_order, msname, recipe, copy.deepcopy(gaintables), copy.deepcopy(interps),
-                         "nearest", "target", pipeline, i, calmode=calmode, label=label)
+                         gainfields, "target", pipeline, i, calmode=calmode, label=label)
         else:
             primary = solve(msname, msinfo, recipe, config, pipeline, i,
                             prefix_msbase, label=label, ftype="primary")
@@ -705,8 +705,10 @@ def worker(pipeline, recipe, config):
                 applycal(secondary_order, msname, recipe, copy.deepcopy(gaintables), interps,
                          "nearest", "xcal", pipeline, i, calmode=calmode, label=label)
             if "target" in config["apply_cal"]["applyto"]:
+#                 applycal(secondary_order, msname, recipe, copy.deepcopy(gaintables), interps,
+#                          "nearest", "target", pipeline, i, calmode=calmode, label=label)
                 applycal(secondary_order, msname, recipe, copy.deepcopy(gaintables), interps,
-                         "nearest", "target", pipeline, i, calmode=calmode, label=label)
+                         gainfields, "target", pipeline, i, calmode=calmode, label=label)

         if {"gcal", "fcal", "target"}.intersection(config["apply_cal"]["applyto"]):
             substep = 'save-{0:s}-ms{1:d}'.format(flags_after_worker, i)
rubyvanrooyen commented 2 years ago

You unfortunately have to calibrate the whole MS, but applycal now work for target. After which the corrected target data can be split out

crosscal:
  enable: true
  label_in: ''
  label_cal: 1gc
  uvrange: '>150'
  set_model:
    enable: true
    # error in 1934 model for this caracal installation, using meqtree model
    meerkat_skymodel: true
  primary:
    reuse_existing_gains: true
    order: KGPAKGP
    combine: ["scan", "spw", "scan,spw", null, "", "", "scan,spw"]
    solint: [inf, inf, inf, null, int, int, inf]
    calmode: [ap, ap, ap, null, ap, ap, ap]
    b_fillgaps: 70
    degamp: 9
    degphase: 3
    visnorm: false
    plotgains: true
  # the default rime is 'KGB', so if you are using the 'P' option
  # you need to specify it for all the calibrators, else the pipeline
  # will attempt to run 'B' which will cause and error, because it does not exist
  secondary:
    reuse_existing_gains: true
    order: KGAKF
    apply: P
    combine: ["scan", "spw", null, "", ""]
    solint: [inf, inf, null, 60s, 60s]
    calmode: [ap, ap, null, ap, ap]
    plotgains: true
  apply_cal:
    applyto:
      - fcal
      - gcal
      - target