ACES-CMZ / reduction_ACES

Reduction scripts and tools for ACES
https://worldwidetelescope.org/webclient/?wtml=https://data.rc.ufl.edu/pub/adamginsburg/ACES/mosaics/mosaics.wtml
15 stars 12 forks source link

Execution Block ID uid://A001/X15a0/Xc4 Sgr_A_st_g_03_TM1 #134

Open keflavich opened 2 years ago

keflavich commented 2 years ago

Sgr_A_st_g_03_TM1 uid://A001/X15a0/Xc4

Product Links:

Reprocessed Product Links:

nbudaiev commented 2 years ago

All SPWs binned x2 #179 Otherwise, the data look good.

keflavich commented 1 year ago

In the line mosaics, it looks like these data are in a broken state. All of the images look blank. @bazarsen could you have another look at these? I'm going to delete the bad ones.

keflavich commented 1 year ago

I also need to delete the entire calibrated directory and re-run the pipeline

keflavich commented 1 year ago

The pipeline failures included:

2022-07-10 04:39:15     INFO    MSTransformManager::createOutputMSStructure     Create output MS structure
2022-07-10 04:39:16     SEVERE  mstransform::::casa     Task mstransform raised an exception of class RuntimeError with the following message: Desired column (CORRECTED_DATA) not found in the input MS (/orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67.ms).
2022-07-10 04:39:16     INFO    mstransform::::casa     Task mstransform complete. Start time: 2022-07-10 00:39:14.894255 End time: 2022-07-10 00:39:15.828004
2022-07-10 04:39:16     INFO    mstransform::::casa     ##### End Task: mstransform          #####
2022-07-10 04:39:16     INFO    mstransform::::casa     ##########################################
2022-07-09 12:33:23     INFO    flagmanager::::casa     ##########################################
2022-07-09 12:33:23     INFO    flagmanager::::casa     ##### Begin Task: flagmanager        #####
2022-07-09 12:33:23     INFO    flagmanager::::casa     flagmanager( vis='uid___A002_Xf8f6a9_X9e67.ms', mode='restore', versionname='Pipeline_Final', oldname='', comment='', merge='replace' )
2022-07-09 12:33:23     INFO    flagmanager::AgentFlagger::open Table type is Measurement Set
2022-07-09 12:33:23     INFO    flagmanager::::casa     Restore flagversions Pipeline_Final
2022-07-09 12:33:23     SEVERE  AgentFlagger::restoreFlagVersion (file src/code/flagging/Flagging/AgentFlagger.cc, line 1001)   Could not restore Flag Version : ScalarColumn::putColumn(Vector&): Table conformance error (#rows mismatch)
2022-07-09 12:33:24     SEVERE  agentflagger:: (file src/tools/agentflagger/agentflagger_cmpt.cc, line 35)      Exception Reported: ScalarColumn::putColumn(Vector&): Table conformance error (#rows mismatch)
2022-07-09 12:33:24     SEVERE  flagmanager::::casa     Task flagmanager raised an exception of class RuntimeError with the following message: ScalarColumn::putColumn(Vector&): Table conformance error (#rows mismatch)
2022-07-09 12:33:24     INFO    flagmanager::::casa     Task flagmanager complete. Start time: 2022-07-09 08:33:22.556263 End time: 2022-07-09 08:33:23.744330
2022-07-09 12:33:24     INFO    flagmanager::::casa     ##### End Task: flagmanager          #####
2022-07-09 12:33:24     INFO    flagmanager::::casa     ##########################################

which is an identical failure mode to #132

keflavich commented 1 year ago

Nov 14 pipeline run:

2022-11-11 05:37:39     INFO    mstransform::::casa     ##########################################
2022-11-11 05:37:39     INFO    mstransform::::casa     ##### Begin Task: mstransform        #####
2022-11-11 05:37:39     INFO    mstransform::::casa     mstransform( vis='uid___A002_Xf8f6a9_X9e67.ms', outputvis='uid___A002_Xf8f6a9_X9e67_target.ms', createmms=False, separationaxis='auto', numsubms='auto', tileshape=[0], field='3,4,5,6,
7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87
,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143', spw='25,27,29,31,33,35',
 scan='', antenna='', correlation='', timerange='', intent='OBSERVE_TARGET#ON_SOURCE', array='', uvrange='', observation='', feed='', datacolumn='corrected', realmodelcol=False, keepflags=True, usewtspectrum=False, combinespws=False, chana
verage=False, chanbin=1, hanning=False, regridms=False, mode='channel', nchan=-1, start=0, width=1, nspw=1, interpolation='linear', phasecenter='', restfreq='', outframe='', veltype='radio', preaverage=False, timeaverage=False, timebin='0s
', timespan='', maxuvwdistance=0.0, docallib=False, callib='', douvcontsub=False, fitspw='', fitorder=0, want_cont=False, denoising_lib=True, nthreads=1, niter=1, disableparallel=False, ddistart=-1, taql='', monolithic_processing=False, re
index=False )
2022-11-11 05:37:39     INFO    MSTransformManager::parseMsSpecParams   Input file name is uid___A002_Xf8f6a9_X9e67.ms
2022-11-11 05:37:39     INFO    MSTransformManager::parseMsSpecParams   Data column is CORRECTED
2022-11-11 05:37:39     INFO    MSTransformManager::parseMsSpecParams   Output file name is uid___A002_Xf8f6a9_X9e67_target.ms
2022-11-11 05:37:39     INFO    MSTransformManager::parseMsSpecParams   Re-index is disabled
2022-11-11 05:37:39     INFO    MSTransformManager::parseMsSpecParams   Tile shape is [0]
2022-11-11 05:37:39     INFO    MSTransformManager::parseDataSelParams  field selection is 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54
,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125
,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143
2022-11-11 05:37:39     INFO    MSTransformManager::parseDataSelParams  spw selection is 25,27,29,31,33,35
2022-11-11 05:37:39     INFO    MSTransformManager::parseDataSelParams  scan intent selection is OBSERVE_TARGET#ON_SOURCE
2022-11-11 05:37:39     WARN    MSTransformManager::checkDataColumnsToFill      CORRECTED_DATA column requested but not available in input MS
2022-11-11 05:37:39     INFO    MSTransformManager::initDataSelectionParams     Selected SPWs Ids are Axis Lengths: [6, 4]  (NB: Matrix in Row/Column order)
2022-11-11 05:37:39     INFO    MSTransformManager::initDataSelectionParams+    [25, 0, 1919, 1
2022-11-11 05:37:39     INFO    MSTransformManager::initDataSelectionParams+     27, 0, 1919, 1
2022-11-11 05:37:39     INFO    MSTransformManager::initDataSelectionParams+     29, 0, 1919, 1
2022-11-11 05:37:39     INFO    MSTransformManager::initDataSelectionParams+     31, 0, 1919, 1
2022-11-11 05:37:39     INFO    MSTransformManager::initDataSelectionParams+     33, 0, 3839, 1
2022-11-11 05:37:39     INFO    MSTransformManager::initDataSelectionParams+     35, 0, 3839, 1]
2022-11-11 05:37:39     INFO    MSTransformManager::open        Select data
2022-11-11 05:37:39     INFO    MSTransformManager::createOutputMSStructure     Create output MS structure
2022-11-11 05:37:39     SEVERE  mstransform::::casa     Task mstransform raised an exception of class RuntimeError with the following message: Desired column (CORRECTED_DATA) not found in the input MS (/orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67.ms).
2022-11-11 05:37:39     INFO    mstransform::::casa     Task mstransform complete. Start time: 2022-11-11 00:37:39.255101 End time: 2022-11-11 00:37:39.405362
2022-11-11 05:37:39     INFO    mstransform::::casa     ##### End Task: mstransform          #####
2022-11-11 05:37:39     INFO    mstransform::::casa     ##########################################

so it looks like this entire MOUS is a failure. I'm going to delete everything and try to get it to go fresh.

keflavich commented 1 year ago

Nov 25 pipeline run ends the same way:

2022-11-21 05:34:31     SEVERE  mstransform::::casa     Task mstransform raised an exception of class RuntimeError with the following message: Desired column (CORRECTED_DATA) not found in the input MS (/orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xb2/calibrated/working/uid___A002_Xfe3986_X9083.ms).
keflavich commented 1 year ago

Interestingly, this happens on line 1436, which doesn't crash the pipeline - it just keeps going. I think the pipeline should fail at this point.

I need some expert help here - @d-l-walker @piposona @pyhsiehATalma, this looks like a pipeline problem to me. Can any of you successfully restore the data? If so, what are you doing differently?

The first 10,000 lines of the log file are here: (I didn't post the full log because it's >100 MB) casa_log_mpi_pipeline_imaging_member.uid___A001_X15a0_Xb2_52087485_2022-11-21_00_02_47.first10000lines.log

Data check:

(python39) login4.ufhpc /orange/adamginsburg/ACES/data$ ls -lh *uid___A002_Xfed4ee_X1e3* *uid___A002_Xfee03e_X2787*
-rw-r--r-- 1 adamginsburg adamginsburg  60G Nov 15 22:40 2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  60G Nov 16 02:37 2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg 118G Oct 26 08:19 corrupt_2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  65G Oct 26 11:06 corrupt_2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
(python39) login4.ufhpc /orange/adamginsburg/ACES/data$ md5sum *uid___A002_Xfed4ee_X1e3* *uid___A002_Xfee03e_X2787*
d78bebe11bcde4d22ce3697ae9f5caf4  2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
bef9fc5181ad37f94b96460b92ad9116  corrupt_2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
1c7551e38e3378825187314b0438787f  2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
d-l-walker commented 1 year ago

Hi @keflavich -- I can try to take a look at this next week. There's a scheduled electrical shutdown at Manchester this weekend, so there's no point in me setting any jobs running now.

My first thought looking at the log file is that you're using CASA Version PIPELINE 6.4.3.2, whereas the delivered data were processed using CASA Version 6.2.1.7. Maybe try restoring the calibration using this pipeline version to rule out whether that could be an issue?

keflavich commented 1 year ago

I can try that.

Could you (anyone) verify the MD5sums of the ASDM files by downloading a fresh copy?

pyhsiehATalma commented 1 year ago

Hi @keflavich, just want to know which asdm you mentioned? The MOUS (X15a0_Xb2) of the log file is Sgr_A_st_d_03_TM1, but this is a issue of Sgr_A_st_g_03_TM1

(casa_log_mpi_pipeline_imaging_member.uid___A001_X15a0_Xb2_52087485_2022-11-21_00_02 _47.first10000lines.log))

Sgr_A_st_g_03_TM1 globus ls $hpg:\~$hpg_pth"member.uid_A001_X15a0Xc4/raw" uidA002_Xf8f6a9_X113a4.asdm.sdm/ uid___A002_Xf8f6a9_X9e67.asdm.sdm/

Sgr_A_st_d_03_TM1 globus ls $hpg:\~$hpg_pth"member.uid_A001_X15a0Xb2/raw" uidA002_Xfe3986_X9083.asdm.sdm/ uid___A002_Xfe62c1_X1871.asdm.sdm/

which SB of these two corrupted files? 2021.1.00172.L_uid_A002_Xfed4ee_X1e3.asdm.sdm.tar 2021.1.00172.LuidA002_Xfee03e_X2787.asdm.sdm.tar

keflavich commented 1 year ago

Removed calibrated/ directory.

keflavich commented 1 year ago

All of these files were reimaged earlier in the month, but they still appear to be junk:

$ ls -lhrtd /orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A001_X15a0_Xc4.s*_0.Sgr_A_star_sci.spw*.cube.I.iter1.image
drwxrwsr-x+ 4 adamginsburg adamginsburg 4.0K Jan 13 23:57 /orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A001_X15a0_Xc4.s12_0.Sgr_A_star_sci.spw25.cube.I.iter1.image
drwxrwsr-x+ 4 adamginsburg adamginsburg 4.0K Jan 14 15:31 /orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A001_X15a0_Xc4.s12_0.Sgr_A_star_sci.spw27.cube.I.iter1.image
drwxrwsr-x+ 4 adamginsburg adamginsburg 4.0K Jan 15 08:35 /orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A001_X15a0_Xc4.s12_0.Sgr_A_star_sci.spw29.cube.I.iter1.image
drwxrwsr-x+ 4 adamginsburg adamginsburg 4.0K Jan 17 19:00 /orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A001_X15a0_Xc4.s38_0.Sgr_A_star_sci.spw31.cube.I.iter1.image
drwxrwsr-x+ 4 adamginsburg adamginsburg 4.0K Jan 18 06:09 /orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A001_X15a0_Xc4.s38_0.Sgr_A_star_sci.spw35.cube.I.iter1.image
drwxrwsr-x+ 4 adamginsburg adamginsburg 4.0K Jan 18 21:27 /orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A001_X15a0_Xc4.s38_0.Sgr_A_star_sci.spw33.cube.I.iter1.image

EDIT: but these images were produced before the latest download of ASDMs:

$ ls -lhrtd *uid___A002_Xfed4ee_X1e3* *uid___A002_Xfee03e_X2787*
-rw-r--r-- 1 adamginsburg adamginsburg 118G Oct 26 08:19 corrupt_2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  65G Oct 26 11:06 corrupt_2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  60G Nov 15 22:40 corrupt_Jan2023_2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  60G Nov 16 02:37 corrupt_Jan2023_2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  62G Jan 18 14:07 2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  61G Jan 18 15:36 2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
$ md5sum *uid___A002_Xfed4ee_X1e3* *uid___A002_Xfee03e_X2787*
0b7379e1373119e2000a4f8cf1c4819b  2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
bef9fc5181ad37f94b96460b92ad9116  corrupt_2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
d78bebe11bcde4d22ce3697ae9f5caf4  corrupt_Jan2023_2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
a43ea10b31b9d132d17f0562207db93a  2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
ac82e935b587a3cfb868483b84ba16ef  corrupt_2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
1c7551e38e3378825187314b0438787f  corrupt_Jan2023_2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
keflavich commented 1 year ago

I re-extracted the tarballs. The pipeline failed, however, with:

2023-02-13 09:59:48     INFO    applycal::::casa        ##########################################
2023-02-13 09:59:48     INFO    applycal::::casa        ##### Begin Task: applycal           #####
2023-02-13 09:59:48     INFO    applycal::::casa        applycal( vis='uid___A002_Xf8f6a9_X9e67.ms', field='3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143', spw='25,27,29,31,33,35', intent='OBSERVE_TARGET#ON_SOURCE', selectdata=True, timerange='', uvrange='', antenna='*&*', scan='', observation='', msselect='', docallib=False, callib='', gaintable=['uid___A002_Xf8f6a9_X9e67.ms.hif_uvcontfit.s5_1.Sgr_A_star.uvcont.tbl'], gainfield=[], interp=[], spwmap=[], calwt=[False], parang=False, applymode='calflag', flagbackup=True )
2023-02-13 09:59:48     INFO    applycal::calibrater::open      ****Using NEW VI2-driven calibrater tool****
2023-02-13 09:59:48     INFO    applycal::calibrater::open      Opening MS: uid___A002_Xf8f6a9_X9e67.ms for calibration.
2023-02-13 09:59:48     INFO    applycal::VisSetUtil::addScrCols        Adding CORRECTED_DATA column(s).
2023-02-13 10:00:18     INFO    applycal::::    Process 40487: waiting for write-lock on file /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67.ms/table.lock

i.e., it locked itself out of running the pipeline. I think the only thing to do with this is delete it and start over.

keflavich commented 1 year ago

Pipeline rerun triggered. Let's see if it gets past that point this time.

keflavich commented 1 year ago

I think there were multiple clones of the pipeline running that managed to conflict and create that write lock. Not sure how that's even possible since the pipeline shouldn't be able to start if the calibrated/ directory exists, but race conditions happen.

keflavich commented 1 year ago

Pipeline failed again at

2023-02-15 17:14:23 INFO: Restoring calibration state for uid___A002_Xf8f6a9_X9e67.ms from ../rawdata/uid___A002_Xf8f6a9_X9e67.ms.calapply.txt
2023-02-15 17:14:23 INFO: Importing calibration state from /scratch/local/57260167/tmp1yasll_n
ESC[33m2023-02-15 17:14:23 WARNING: Could not access uid___A002_Xf8f6a9_X9e67.ms.hif_uvcontfit.s5_1.Sgr_A_star.uvcont.tbl. Using heuristics to determine caltable typeESC[0m

which suggests a problem with the auxiliary tarball. I'm removing and re-downloading everything:

$ ls -lhrt *uid___A002_Xfed4ee_X1e3* *uid___A002_Xfee03e_X2787* *Xc4*
-rw-r--r-- 1 adamginsburg adamginsburg 146G Jun 23  2022 2021.1.00172.L_uid___A001_X15a0_Xc4_001_of_001.tar
-rw-r--r-- 1 adamginsburg adamginsburg 749M Jun 23  2022 2021.1.00172.L_uid___A001_X15a0_Xc4_auxiliary.tar
-rw-r--r-- 1 adamginsburg adamginsburg 3.5K Aug 23 05:54 member.uid___A001_X15a0_Xc4.README.txt
-rw-r--r-- 1 adamginsburg adamginsburg 3.7G Dec 17 02:14 2021.1.00172.L_uid___A002_Xfe90b7_Xc423.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  62G Jan 18 14:07 2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  61G Jan 18 15:36 2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
keflavich commented 1 year ago

ok. so. I downloaded everything again. Extracted it all again. Total fresh start of everything. This again:

2023-02-16 09:34:28     INFO    applycal::::    Process 57713: waiting for write-lock on file /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67.ms/table.lock

???

keflavich commented 1 year ago

Just to check file integrity:

$ ls -lhrt *uid___A002_Xfed4ee_X1e3* *uid___A002_Xfee03e_X2787* *Xc4*
-rw-r--r-- 1 adamginsburg adamginsburg  60G Feb 16 01:48 2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  60G Feb 16 04:03 2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg 3.7G Feb 16 04:13 2021.1.00172.L_uid___A002_Xfe90b7_Xc423.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg 3.5K Feb 16 04:14 member.uid___A001_X15a0_Xc4.README.txt
-rw-r--r-- 1 adamginsburg adamginsburg 749M Feb 16 10:39 2021.1.00172.L_uid___A001_X15a0_Xc4_auxiliary.tar
-rw-r--r-- 1 adamginsburg adamginsburg 149G Feb 16 10:41 2021.1.00172.L_uid___A001_X15a0_Xc4_001_of_001.tar
$ md5sum *uid___A002_Xfed4ee_X1e3* *uid___A002_Xfee03e_X2787* *Xc4*
7f699bcd2e75c1ed5737e75a720f98d0  2021.1.00172.L_uid___A002_Xfed4ee_X1e3.asdm.sdm.tar
61f181a22f41300c2f24c2811db11e77  2021.1.00172.L_uid___A002_Xfee03e_X2787.asdm.sdm.tar
9723ccfc4279aa66d74a311a0dfb5286  2021.1.00172.L_uid___A001_X15a0_Xc4_001_of_001.tar
efd5d049e16d153a86a0efa4ed9a654c  2021.1.00172.L_uid___A001_X15a0_Xc4_auxiliary.tar
d438a1777c13584d97e94cfd192b4340  2021.1.00172.L_uid___A002_Xfe90b7_Xc423.asdm.sdm.tar
45243dd5a6a80b5da300bb9d2ff21537  member.uid___A001_X15a0_Xc4.README.txt
keflavich commented 1 year ago

It looks like I got the ASDM names wrong somehow.

From the pipeline run, I see that there are ASDMs from:

2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
2021.1.00172.L_uid___A002_Xf8f6a9_X113a4.asdm.sdm.tar

which do not match the IDs of those I posted above. It was probably a copy-paste error.

$ ls -lh 2021.1.00172.L_uid___A002_Xf8f6a9_X113a4.asdm.sdm.tar 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg 59G Jul  5  2022 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg 59G Jul  5  2022 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
$ md5sum 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
3855bd5564fa38e500b1718452bf00c4  2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
3855bd5564fa38e500b1718452bf00c4  2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
mv 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar bad_tarballs/
keflavich commented 1 year ago

Continuum imaging might be OK? There's virtually no signal; the brightest peak is ~0.3 mJy.

image

keflavich commented 1 year ago

Despite that continuum looking OK, the g pipeline run failed with a timeout/writelock failure. There may still be good images in failed_member.uid___A001_X15a0_Xc4_20230224 but the failed pipeline is a red flag that can't be ignored:

2023-02-20 09:00:42     INFO    applycal::::casa        ##########################################
2023-02-20 09:00:42     INFO    applycal::::casa        ##### Begin Task: applycal           #####
2023-02-20 09:00:42     INFO    applycal::::casa        applycal( vis='/orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67_target.ms', field='Sgr_A_star', spw='25,27,29,31,33,35', intent='OBSERVE_TARGET#ON_SOURCE', selectdata=True, timerange='', uvrange='', antenna='*&*', scan='', observation='', msselect='', docallib=False, callib='', gaintable=['uid___A002_Xf8f6a9_X9e67_target.ms.hif_uvcontfit.s5_1.Sgr_A_star.uvcont.tbl'], gainfield=[], interp=[], spwmap=[], calwt=[False], parang=False, applymode='calflag', flagbackup=True )
2023-02-20 09:00:42     INFO    applycal::calibrater::open      ****Using NEW VI2-driven calibrater tool****
2023-02-20 09:00:42     INFO    applycal::calibrater::open      Opening MS: /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67_target.ms for calibration.
2023-02-20 09:00:42     INFO    applycal::VisSetUtil::addScrCols        Adding CORRECTED_DATA column(s).
2023-02-20 09:01:11     INFO    applycal::::    Process 128170: waiting for write-lock on file /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67_target.ms/table.lock

Rerun will start with:

ls -lh 2021.1.00172.L_uid___A002_Xf8f6a9_X113a4.asdm.sdm.tar 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar  *Xc4*tar
-rw-r--r-- 1 adamginsburg adamginsburg 149G Feb 16 10:41 2021.1.00172.L_uid___A001_X15a0_Xc4_001_of_001.tar
-rw-r--r-- 1 adamginsburg adamginsburg 749M Feb 16 10:39 2021.1.00172.L_uid___A001_X15a0_Xc4_auxiliary.tar
-rw-r--r-- 1 adamginsburg adamginsburg  59G Feb 18 18:52 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  59G Feb 18 18:52 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg 3.7G Feb 16 04:13 2021.1.00172.L_uid___A002_Xfe90b7_Xc423.asdm.sdm.tar
keflavich commented 1 year ago

A new data point for tracing this down: the applycal task in the above message used the full path. At least one other example of applycal that was successful did not use the full path.

keflavich commented 1 year ago

I typo'd above and did not successfully re-download the second ASDM, which caused layered problems.

keflavich commented 1 year ago

:cry:

2023-02-24 22:48:26     INFO    applycal::::casa        ##########################################
2023-02-24 22:48:26     INFO    applycal::::casa        ##### Begin Task: applycal           #####
2023-02-24 22:48:26     INFO    applycal::::casa        applycal( vis='uid___A002_Xf8f6a9_X9e67.ms', field='3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143', spw='25,27,29,31,33,35', intent='OBSERVE_TARGET#ON_SOURCE', selectdata=True, timerange='', uvrange='', antenna='*&*', scan='', observation='', msselect='', docallib=False, callib='', gaintable=['uid___A002_Xf8f6a9_X9e67.ms.hif_uvcontfit.s5_1.Sgr_A_star.uvcont.tbl'], gainfield=[], interp=[], spwmap=[], calwt=[False], parang=False, applymode='calflag', flagbackup=True )
2023-02-24 22:48:26     INFO    applycal::calibrater::open      ****Using NEW VI2-driven calibrater tool****
2023-02-24 22:48:26     INFO    applycal::calibrater::open      Opening MS: uid___A002_Xf8f6a9_X9e67.ms for calibration.
2023-02-24 22:48:26     INFO    applycal::VisSetUtil::addScrCols        Adding CORRECTED_DATA column(s).
2023-02-24 22:48:59     INFO    applycal::::    Process 77329: waiting for write-lock on file /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67.ms/table.lock
$ ls -lh 2021.1.00172.L_uid___A002_Xf8f6a9_X113a4.asdm.sdm.tar 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar  *Xc4*tar
-rw-r--r-- 1 adamginsburg adamginsburg 149G Feb 16 10:41 2021.1.00172.L_uid___A001_X15a0_Xc4_001_of_001.tar
-rw-r--r-- 1 adamginsburg adamginsburg 749M Feb 16 10:39 2021.1.00172.L_uid___A001_X15a0_Xc4_auxiliary.tar
-rw-r--r-- 1 adamginsburg adamginsburg  59G Feb 25 01:24 2021.1.00172.L_uid___A002_Xf8f6a9_X113a4.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg  59G Feb 18 18:52 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
-rw-r--r-- 1 adamginsburg adamginsburg 3.7G Feb 16 04:13 2021.1.00172.L_uid___A002_Xfe90b7_Xc423.asdm.sdm.tar
$ md5sum 2021.1.00172.L_uid___A002_Xf8f6a9_X113a4.asdm.sdm.tar 2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar  *Xc4*tar
ef7be11430edc54ae8f5ec847f25cf78  2021.1.00172.L_uid___A002_Xf8f6a9_X113a4.asdm.sdm.tar
fe73d03b5f47fbb24caa69c33a84ccaf  2021.1.00172.L_uid___A002_Xf8f6a9_X9e67.asdm.sdm.tar
9723ccfc4279aa66d74a311a0dfb5286  2021.1.00172.L_uid___A001_X15a0_Xc4_001_of_001.tar
efd5d049e16d153a86a0efa4ed9a654c  2021.1.00172.L_uid___A001_X15a0_Xc4_auxiliary.tar
d438a1777c13584d97e94cfd192b4340  2021.1.00172.L_uid___A002_Xfe90b7_Xc423.asdm.sdm.tar

next approach:

keflavich commented 1 year ago

This run appears to be successful:

-rw-r--r-- 1 adamginsburg adamginsburg  36K Feb 25 21:14 casa_log_mpi_pipeline_58429783_2023-02-25_21_14_17.log
-rw-r--r-- 1 adamginsburg adamginsburg 860K Feb 26 09:34 casa_log_mpi_pipeline_member.uid___A001_X15a0_Xc4_58429783_2023-02-25_21_14_43.log
-rw-r--r-- 1 adamginsburg adamginsburg 1.2M Feb 26 09:34 run_pipeline_mpi_58429783.log

At least, there are no errors. The corresponding interactive run failed b/c plotms doesn't work on hipergator and I didn't apply the hack before running it.

Next step is to try imaging.

These are the mses:

drwxrwsr-x+ 28 adamginsburg adamginsburg 4.0K Feb 25 22:24 /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67.ms
drwxrwsr-x+ 28 adamginsburg adamginsburg 4.0K Feb 25 23:00 /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X113a4.ms
drwxrwsr-x+ 28 adamginsburg adamginsburg 4.0K Feb 26 00:27 /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67_target.ms
drwxrwsr-x+ 28 adamginsburg adamginsburg 4.0K Feb 26 00:30 /orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X113a4_target.ms
keflavich commented 1 year ago

Continuum is good, lines are good. 25 and 27 are still going, though.

keflavich commented 1 year ago

25 died with a weird failure, and now is dying on startup with a major and unacceptable error:

2023-03-31 19:55:25     INFO    split::::casa+  RuntimeError: Desired column (CORRECTED_DATA) not found in the input MS (/orange/adamginsburg/ACES/rawdata/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_Xc4/calibrated/working/uid___A002_Xf8f6a9_X9e67_target.ms).

I don't know where this is coming from, there's no reason for these scripts to have changed

d-l-walker commented 10 months ago

This SB still needs to be updated to undo size mitigation (see #179).

nbudaiev commented 7 months ago

QA - Line contamination in continuum images from high/low frequencies

Looks okay? Maybe some contamination in spw25_27. Several compact sources are visible in spw25_27, but not in spw33_35.

uid___A001_X15a0_Xc4 s36_0 Sgr_A_star_sci spw33_35 cont I iter1 image tt0-uid___A001_X15a0_Xc4 s36_0 Sgr_A_star_sci oldhigh_spw33_35 cont I iter1 image tt0-uid___A001_X15a0_Xc4 s36_0 Sgr_A_star_sci sp-2024-02-28-12-31-37 uid___A001_X15a0_Xc4 s38_0 Sgr_A_star_sci spw35 cube I iter1 image pbcor statcont contsub_diagnostic_spectra

nbudaiev commented 6 months ago

QA - Line contamination in continuum images from high/low frequencies (compared againts v1.1)

Looks great.

uid___A001_X15a0_Xc4 s36_0 Sgr_A_star_sci spw33_35 cont I iter1 image tt0 pbcor-uid___A001_X15a0_Xc4 s36_0 Sgr_A_star_sci v1 1_20240314_high_spw33_35 cont I iter1 image tt0 pbcor-uid___A001_X15a0_Xc4 -2024-04-03-00-32-46

uid___A001_X15a0_Xc4 s38_0 Sgr_A_star_sci spw35 cube I iter1 image_diagnostic_spectra

d-l-walker commented 6 months ago

Reminder that SPWs 25,27,29,31,35 all need to be un-size-mitigated and re-stat-cont-ed @keflavich

keflavich commented 6 months ago

Moved files: mv *spw{25,27,29,31,35}* sizemitigated/. Rerun forthcoming