Open keflavich opened 2 years ago
Compact bright spots in swp 25 - more SiO masers?
findcont: spw33 has LowBW warning. No visible issues with other SPWs. Full continuum: no issues.
SPW cubes: spw31: some structure in residuals: spw31: some structure in line-free mom8: spw35: tclean stopped to prevent divergence (stop code 5). Field: Sgr_A_star SPW: 35 No issues with other SPWs.
spw35 cube divergence: Divergence happened only in channel 515. The rest of the cube looks good.
Nothing major here. Main thing is that SPW 35 diverged an will need to be recleaned.
Data look really good. Reclean SPW 35 Re-run pipeline without size mitigation All cubes will benefit a lot from combination with 7m & TP
@piposona could you upload the completed hi-res reimaging products?
Uploaded to /upload/Repipelined_member.uid___A001_X15a0_X190/ Let me know if that is ok and I will delete it from our cluster.
At a glance, it looks good! I'll move it over into the right directories soon
Hi, I recleaned spw 35 of 12 m EB uid://A001/X15a0/X190. The divergence in channel 515 was resolved. The results look fine. I will upload the reclean cube.
Left: pipeline product, Right: reclean
I used the pipeline selected line-free channels to subtract continuum. The tclean parameters are below (almost same as pipeline parameter except automasking parameters).
ms='concat.spw35.contsub' phasecenter = 'ICRS 17:46:03.1636 -028.39.24.691' imagename = 'uid___A001_X15a0_X190.s38_0.Sgr_A_star_sci.spw35.cube.I.iter0' restfreq = '100.5GHz' imsize = [1024, 2592] cell = '0.28arcsec'
tclean(vis = ms, imagename = imagename, field = '', intent='OBSERVE_TARGET#ON_SOURCE', phasecenter = phasecenter,, restfreq = restfreq, spw = '0~1', threshold='10mJy', imsize = imsize, cell = cell, niter = 10000000, cycleniter = 250, start = '99.5625613617GHz', width = '0.9764731MHz', nchan = 1916, outframe = 'LSRK', deconvolver = 'hogbom', weighting = 'briggsbwtaper', robust = 0.5, specmode = 'cube', restoringbeam = 'common', gridder = 'mosaic', parallel = True, usemask="auto-multithresh", sidelobethreshold = 2.0, noisethreshold = 3.25, negativethreshold = 10.0, lownoisethreshold =1.5, smoothfactor = 1.0, minbeamfrac = 0.3, cutthreshold = 0.01, growiterations = 75, interactive = False)
The reclean cubes were uploaded to,
/upload/uid_A001_X15a0_X190/
override_tclean_commands.json was updated for reclean of spw 35.
the reclean of spw33 is disastrous - both my attempt and @pyhsiehATalma 's look like the data were totally uncalibrated. So maybe they were.
left is the "product", right is reclean
@keflavich this looks weird, I will look into spw 33.
@keflavich @d-l-walker
I re-cleaned spw 33. the results look like to be consistent with product. I am not sure the status of this execution blocks, but shall I update the tclean script by pulling the request? The name of re-clean cube is "...iter1..".
The calibrated/ data for this field were still completely screwed up, so I'm re-running everything here from scratch again.
There is a recleaned product of spw35 on disk, but its parameters are wrong.
https://github.com/ACES-CMZ/reduction_ACES/pull/274 is a proposed fix. @pyhsiehATalma could you verify that these are the correct parameters? Specifically, when you re-ran the clean, what was nchan
?
The cube imaging parameters for SPWs 25, 27, 29, and 31 were still those for the size mitigated products. I've updated these in #300.
Moved old cubes to cubes_pre20221116/
(including spw35 reclean, which may not be necessary). All reimaging jobs are running.
@keflavich I was just about to mark this one as done, but I'm having trouble finding the cube for SPW 33. All other SPWs have been cleaned at the native spectral resolution, and can be found in the ~/member.uid___A001_X15a0_X190/calibrated/working/
folder, but I don't see SPW 33.
It's included in the override_tclean_commands
file, so I'm not sure why it's not there. Is it still running after all this time? I see that the SPW 35 cube was only finished a few days ago, while the other SPWs finished back in November ...
It's in day 10^6 of "still imaging".
I'm growing concerned that this might not be finishing a major cycle in <96h. Might need to look into this further.
😬 I've been encountering this issue with the array combination. Some of the more complex regions can take many weeks, often failing in the end anyway 🙃
Options to consider: (1) Maybe a higher cyclethreshold to trigger the major cycle sooner, so it spends more time on the major cycle and less on the minor cycle? Seems... unlikely to work? (2) More memory? not clear this is the bottleneck (3) MPI ⚡ominous thunderclap ⚡
Looks like SPW 33 still hasn't completed. Not sure if we just leave this and ... hope it finishes eventually?
spw33:
[c0712a-s3:63904] *** Process received signal ***
[c0712a-s3:63904] Signal: Segmentation fault (11)
[c0712a-s3:63904] Signal code: (128)
[c0712a-s3:63904] Failing at address: (nil)
😱
spw35 is plugging along happily. After 5 hours, it's 574 chunks in (is a chunk 1 channel or many? not sure, but Subcubes: 1791.
)
$ fgrep "Run Major Cycle 1" *60575694* | wc
574 5740 133742
spw33 is half as many chunks in, same number of subcubes:
$ fgrep "Run Major Cycle 1" *60575693* | wc
286 2860 66638
and I suspect it will die.
@keflavich I'm guessing still no luck with SPW 33 for this field? Did you set it to run again after the previous segfault? I think this is the only outstanding issue for this region, so hopefully we can get this sorted and get it marked as done. Let me know if there's any way I can help with it.
Nope, it's running Major Cycle 1 again
Looks like we finally have a SPW 33 cube now. Downloading now to check it out.
Note that we have these two cubes for SPW 35:
uid___A001_X15a0_X190.s38_0.Sgr_A_star_sci.spw35.cube.I.iter1.image.pbcor.fits
uid___A001_X15a0_X190.s38_0.Sgr_A_star_sci.spw35.cube.I.iter1.reclean.image.pbcor.fits
I guess we want to move the old one and rename the recleaned one for consistency? @keflavich
After taking an eternity to clean, SPW 33 diverged in a single channel (627) :(
This was done with the default cyclefactor value. I'll update this in a PR soon.
@keflavich SPW 33 cyclefactor increase in #377. Please re-run SPW 33 cleaning once merged. [~/member.uid___A001_X15a0_X190/calibrated/working/
]
moved files, restarted
SPW 33 looks good now, no divergence. Finally marking this one as done.
There still seems to be an issue with SPW35 having a divergent channel. The divergence should have been fixed. https://github.com/ACES-CMZ/reduction_ACES/issues/41#issuecomment-1090738721
This is in the .reclean.image file, dated March 15, 2023:
As of today, the re-cleaned cubes on disk still had this divergence. So it is likely that we need to update the clean parameters and try again. I'm giving one more shot at freshly recleaning this before making that modification, though. In the parallel cleaning, the cube starting at channel 1024 has the divergent channel
I'm boosting the cyclefactor to 2.5 for spw35 and re-running; we still have that one divergent channel 1034:
Just post the image of low-level divergence for the record (channel 645).
QA - Line contamination in continuum images from high/low frequencies
Summary: both files look reasonably good (although large-scale emission is missing)
Files checked:
Results: Both spw33_35 and spw25_27 look reasonably good (no obvious contamination) There is a lot of missing flux generating negative lobes, but the structures are very consistent between the two frequency ranges
Attached image: (zoom to the bottom area of the Brick)
Apparently tclean worked when run with the previous parameter set, but now tclean consistently segfaults any time I try to run the full aggregate or the low frequency imaging.
Maybe this has something to do with it: there appears to be an entirely missing spectral window
missing window was 27. Maybe remaking it fixes this? (the yellow highlight is the old spw selection)
reclean looks good
Going to try to split the measurement set by its spws:
CASA <1>: split(vis='/orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_X190/calibrated/working/uid___A002_Xf53eeb_X323e.ms',
...: outputvis='uid___A002_Xf53eeb_X323e_target_low.ms',
...: field='Sgr_A_star',
...: spw="25:85.96741525086293~85.96839172755016GHz;85.9693682042374~86.04358043246751GHz;86.04724222004465~86.05065988844999GHz;86.05163636513723~86.08190714244161GHz;86.08605716836236~86.15172522557914GHz;86.15270170226638~86.156363
...: 48984351GHz;86.15733996653076~86.17052240180847GHz;86.17149887849571~86.2020137749719GHz;86.20884911178257~86.21080206515704GHz;86.21299913770333~86.21373149521875GHz;86.21617268693686~86.21885799782676GHz;86.22544921546562~86.23
...: 25286714481GHz;86.23643457819705~86.2378992932279GHz;86.25108172850561~86.26328768709611GHz;86.26426416378334~86.29062903433876GHz;86.291605511026~86.29722025197762GHz;86.30332323127287~86.3042997079601GHz;86.3091820913963~86.312
...: 59975980163GHz;86.35507649569648~86.36020299830449GHz;86.36117947499172~86.37680310198755GHz;86.37802369784659~86.38046488956469GHz;86.38144136625192~86.38534727300087GHz;86.38632374968812~86.39730911241955GHz;86.39828558910678~8
...: 6.42513869800582GHz;86.42611517469307~86.43392698819098GHz,27:86.66736581487878~86.71374845752838GHz;86.71472493421574~86.71765436427782GHz;86.71863084096519~86.72717501197958GHz;86.72815148866695~86.73401034879109GHz;86.75427224
...: 005382~86.7547604783975GHz;86.75573695508486~86.75622519342853GHz;86.75720167011589~86.75768990845958GHz;86.75988698100613~86.76135169603718GHz;86.76232817272454~86.77013998622341GHz;86.77111646291078~86.80431667028103GHz;86.8240
...: 9032320007~86.82457856154375GHz;86.86534646324104~86.86900825081864GHz;86.869984727506~86.87804066017671GHz;86.87901713686408~86.88023773272327GHz;86.88145832858247~86.90611436493832GHz;86.90709084162567~86.90879967582856GHz;86.9
...: 0977615251592~86.98618545330186GHz;86.98716192998921~87.053074106386GHz;87.05405058307336~87.06894185255561GHz;87.06991832924297~87.07040656758664GHz;87.09457436559882~87.0950626039425GHz;87.0970155573172~87.11874216361097GHz;87.
...: 11971864029833~87.13094812220298GHz;87.13192459889034~87.13387755226506GHz",
...: )
CASA <2>: split(vis='/orange/adamginsburg/ACES/data/2021.1.00172.L/science_goal.uid___A001_X1590_X30a8/group.uid___A001_X1590_X30a9/member.uid___A001_X15a0_X190/calibrated/working/uid___A002_Xf531c1_X16b2.ms',
...: outputvis='uid___A002_Xf531c1_X16b2_target_low.ms',
...: field='Sgr_A_star',
...: spw="25:85.96741525086293~85.96839172755016GHz;85.9693682042374~86.04358043246751GHz;86.04724222004465~86.05065988844999GHz;86.05163636513723~86.08190714244161GHz;86.08605716836236~86.15172522557914GHz;86.15270170226638~86.156363
...: 48984351GHz;86.15733996653076~86.17052240180847GHz;86.17149887849571~86.2020137749719GHz;86.20884911178257~86.21080206515704GHz;86.21299913770333~86.21373149521875GHz;86.21617268693686~86.21885799782676GHz;86.22544921546562~86.23
...: 25286714481GHz;86.23643457819705~86.2378992932279GHz;86.25108172850561~86.26328768709611GHz;86.26426416378334~86.29062903433876GHz;86.291605511026~86.29722025197762GHz;86.30332323127287~86.3042997079601GHz;86.3091820913963~86.312
...: 59975980163GHz;86.35507649569648~86.36020299830449GHz;86.36117947499172~86.37680310198755GHz;86.37802369784659~86.38046488956469GHz;86.38144136625192~86.38534727300087GHz;86.38632374968812~86.39730911241955GHz;86.39828558910678~8
...: 6.42513869800582GHz;86.42611517469307~86.43392698819098GHz,27:86.66736581487878~86.71374845752838GHz;86.71472493421574~86.71765436427782GHz;86.71863084096519~86.72717501197958GHz;86.72815148866695~86.73401034879109GHz;86.75427224
...: 005382~86.7547604783975GHz;86.75573695508486~86.75622519342853GHz;86.75720167011589~86.75768990845958GHz;86.75988698100613~86.76135169603718GHz;86.76232817272454~86.77013998622341GHz;86.77111646291078~86.80431667028103GHz;86.8240
...: 9032320007~86.82457856154375GHz;86.86534646324104~86.86900825081864GHz;86.869984727506~86.87804066017671GHz;86.87901713686408~86.88023773272327GHz;86.88145832858247~86.90611436493832GHz;86.90709084162567~86.90879967582856GHz;86.9
...: 0977615251592~86.98618545330186GHz;86.98716192998921~87.053074106386GHz;87.05405058307336~87.06894185255561GHz;87.06991832924297~87.07040656758664GHz;87.09457436559882~87.0950626039425GHz;87.0970155573172~87.11874216361097GHz;87.
...: 11971864029833~87.13094812220298GHz;87.13192459889034~87.13387755226506GHz",
...: )
The split version worked for field ak and I'm now running this imaging for ao and it's working. https://github.com/ACES-CMZ/reduction_ACES/pull/418 now incorporates this fix generally
@mpound noted a problem in the continuum selection - spw25+27 was infected by an SiO maser.
The maser in question is this:
it is 2.8 mJy in the continuum, 2.5 Jy in the line data, which means it's diluted by ~1000x, or roughly half the total channels in the cube (which is probably roughly the number of non-flagged channels?).
There are two EBs that contribute to this image. They have wildly different continuum selections, which is not good. These are the relevant excerpts from the continuum selection:
Xf531c1_X16b2: 86.23643457819705~86.2378992932279GHz;86.25108172850561~86.26328768709611GHz
Xf53eeb_X323e: 86.2303796649618~86.237459525754GHz;86.24136565584625~86.24283045463083GHz;86.25601364369217~86.26822030023045GHz;
The first one, 16b2, excludes 86.23789 - 86.25108 - i.e., the entire line pictured above. X323e includes most of that. That's a huge problem! Why? (investigation continues...)
Sgr_A_st_ao_03_TM1 uid://A001/X15a0/X190
Product Links:
Reprocessed Product Links: