caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

Calibration strategy when your primary is also your secondary #1509

Closed AstroRipples closed 10 months ago

AstroRipples commented 1 year ago

Hi folks. Looking for a bit of guidance here.

Processing MeerKAT L-band data where there's only the one calibrator (1934-638, I'll be having none of this J2000 nonsense) used as both primary and secondary, under different field names and scan intents. This has led to a few fun niche problems that I've already encountered (e.g. #1474) but my most recent problems have got me thinking...

I've noticed that the inspect worker always fails on the secondary:

# 2023-05-14 03:27:30: Running ragavi-vis --ms /stimela_mount/msdir/1551621148-cal.MS --xaxis antenna1 --yaxis amp --canvas-height 720 --canvas-width 1080 --corr XX,YY --data-column CORRECTED_DATA --field J1939gaincal --htmlname /stimela_mount/output/1kgb-1551621148-cal-gcal-J1939gaincal-amp_ant.html --iter-axis corr --mem-limit 8GB --num-cores 8
# 14.05.2023@03:27:34 - ragavi.visibilities  - INFO       - Total RAM size: ~503.80 GB
# 14.05.2023@03:27:34 - ragavi.visibilities  - INFO       - Total number of Cores: 128
# 14.05.2023@03:27:34 - ragavi.visibilities  - INFO       - Using 8 cores
# 14.05.2023@03:27:34 - ragavi.visibilities  - INFO       - Memory limit per core: 8GB
# 14.05.2023@03:27:34 - ragavi.visibilities  - INFO       - Available corrs: XX,XY,YX,YY
# 14.05.2023@03:27:34 - ragavi.visibilities  - ERROR      - MS data acquisition failed
# 14.05.2023@03:27:34 - ragavi.visibilities  - ERROR      - list index out of range
# 2023-05-14 03:27:34: ragavi-vis exited with code 255
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR: cd /local/work/riseley/MeerKAT/Abell_3667/.stimela_workdir-1684026381815617 && singularity returns error code 1
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR: job failed at 2023-05-14 03:27:34.847640 after 0:00:05.179063
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR: Traceback (most recent call last):
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:   File "/homes/riseley/caracal/lib/python3.9/site-packages/stimela/recipe.py", line 710, in run
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:     job.run_job()
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:   File "/homes/riseley/caracal/lib/python3.9/site-packages/stimela/recipe.py", line 422, in run_job
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:     self.job.run(output_wrangler=self.apply_output_wranglers)
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:   File "/homes/riseley/caracal/lib/python3.9/site-packages/stimela/singularity.py", line 124, in run
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:     utils.xrun(f"cd {self.execdir} && singularity", ["run", "--workdir", self.execdir, "--containall"] \
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:   File "/homes/riseley/caracal/lib/python3.9/site-packages/stimela/utils/xrun_poll.py", line 227, in xrun
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR:     raise StimelaCabRuntimeError("{} returns error code {}".format(command_name, status))
2023-05-14 03:27:34 CARACal.Stimela.plot-amp_ant-0-gcal ERROR: stimela.utils.StimelaCabRuntimeError: cd /local/work/riseley/MeerKAT/Abell_3667/.stimela_workdir-1684026381815617 && singularity returns error code 1
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Completed jobs : ['plot-amp_ant-0-bpcal']
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Remaining jobs : ['plot-amp_phase-0-bpcal', 'plot-amp_phase-0-gcal', 'plot-amp_scan-0-bpcal', 'plot-amp_scan-0-gcal', 'plot-amp_uvwave-0-bpcal', 'plot-amp_uvwave-0-gcal', 'plot-phase_uvwave-0-bpcal', 'plot-phase_uvwave-0-gcal', 'plot-real_imag-0-bpcal', 'plot-real_imag-0-gcal']
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Logging remaining task: 1kgb
2023-05-14 03:27:34 CARACal.Stimela.inspect INFO: Saving pipeline information in .last_inspect.json
2023-05-14 03:27:34 CARACal ERROR: Job '1kgb' failed: cd /local/work/riseley/MeerKAT/Abell_3667/.stimela_workdir-1684026381815617 && singularity returns error code 1 [PipelineException]
2023-05-14 03:27:34 CARACal INFO:   More information can be found in the logfile at OutputData/logs-20230513-114214/log-caracal.txt
2023-05-14 03:27:34 CARACal INFO:   You are running version 1.0.4
2023-05-14 03:27:34 CARACal ERROR: Traceback (most recent call last):
2023-05-14 03:27:34 CARACal ERROR:   File "/homes/riseley/caracal/lib/python3.9/site-packages/caracal/main.py", line 183, in __run
2023-05-14 03:27:34 CARACal ERROR:     pipeline.run_workers()
2023-05-14 03:27:34 CARACal ERROR:   File "/homes/riseley/caracal/lib/python3.9/site-packages/caracal/workers/worker_administrator.py", line 443, in run_workers
2023-05-14 03:27:34 CARACal ERROR:     recipe.run()
2023-05-14 03:27:34 CARACal ERROR:   File "/homes/riseley/caracal/lib/python3.9/site-packages/stimela/recipe.py", line 761, in run
2023-05-14 03:27:34 CARACal ERROR:     raise PipelineException(exc, self.completed, job, self.remaining) from None
2023-05-14 03:27:34 CARACal ERROR: stimela.exceptions.PipelineException: Job '1kgb' failed: cd /local/work/riseley/MeerKAT/Abell_3667/.stimela_workdir-1684026381815617 && singularity returns error code 1
2023-05-14 03:27:34 CARACal INFO: exiting with error code 1

'til now I've been using the standard calibration schema, but after skirting the inspect worker issues and manually applying the solutions, the resulting maps have really high RMS. Like an order of magnitude too high. Putting it all together has got me thinking...

Is CARACal doing some under-the-hood selection of fields that means they're isn't any calibration information for the secondary in the solution table that inspect is trying to access? It sure looks like it... and if so, am I using the wrong schema for this situation, where your primary is also your secondary? I've noticed some past discussions (e.g. #565) around this, so I was wondering whether this is a solved problem and whether there's an "approved" schema for this situation?

KshitijT commented 1 year ago

@AstroRipples could you please share your config file ?

AstroRipples commented 1 year ago

Sure thing Kshitij, and thanks! There's probably something relatively simple I'm missing, so hopefully this will be a simple fix 😅

Ecco: caracal_initcal_continuum_a3667-1551621148_advanced.yml.txt

AstroRipples commented 1 year ago

Hi @KshitijT ... just checking in, any progress on this?

AstroRipples commented 1 year ago

Tagging in a few people who may have come across this before ... @o-smirnov , @IanHeywood , do either of you have any experience in handling datasets (in CARACal) where a primary calibrator is also interleaved as the secondary?

... Bueller ?

pjmac1105 commented 1 year ago

@AstroRipples yes very recently, some L-band data on those 3 candidate ORCs. I'm on my phone, I'll look in the morning when I'm in the office and send you the config file. From memory the image i got out was excellent.

pjmac1105 commented 1 year ago

@AstroRipples here is the cal strategy i used for ORCs 2 and 3, which has 1934 for primary and secondary. I used this calibration for the candidates we've got, and got excellent images with low RMS note imaging was also done with this script. This is MeerKAT L-band data, 8 hours.

orc23_Lband.yml.txt

AstroRipples commented 10 months ago

In the end, playing around with CASA allowed me to solve the underlying issue. Will close.

The root cause of the problem seems to be that the dataset had J1939-6342 as both primary and secondary calibrator -- listed in the metadata with different scan intents and names but the same source -- and the standard CARACal workflow seems to not handle this very well. Finding a robust workaround is probably a low priority task, but until one is found, treat these situations with extreme caution.