Closed o-smirnov closed 4 years ago
I'm therefore testing my version of Caracal with (I assume) Stimela release, correct?
Yes. You can check by looking at the file system_info.txt in $testname
My question is, how the hell does this work for anyone? I checked yesterday (I thought)
@o-smirnov , is this is the first round of selfcal?
For what it is worth, I ran through with caracal master, with stimela master (not release) and it goes through fine. Note that this is not with caratekit but a standalone test.
Hmm, maybe I wasn't running the tests I thought I was running. I invoked caratekit like so:
caratekit.sh -ws $workspace -td $testdata -lc $meerkathi -ct $testname -dm -or -f
...but forgot to set testname
. So all my output went to a directory called -dm
, while the -dm
option was, of course, ignored. Not sure what test that results in -- and doesn't explain the absolutely bizarre error message! -- but does explain why the rest of you don't see the same result perhaps.
Hoezeeee, I was worried it was my new and wonderful invocation of cubical. Should we change the title then to caratekit.sh crashes when you invoke it with undeclared shell variables
? Or just close this issue?
Let's wait until @o-smirnov confirms with the correct caratekit test?
Should we change the title then to caratekit.sh crashes when you invoke it with undeclared shell variables?
Shell variables have nothing to do with it -- all I did was (in effect) invoke it with -ct -dm
, i.e., with a somewhat funny (but entirely legit) output test directory name. It looks to have defaulted to a minimal test anyway. And crashed in cubical.
Running it with a boring test name results in the same crash:
~/projects/caracal/caratekit.sh -ws $workspace -td $testdata -lc ~/projects/caracal/ -ct test1 -dm -or -f
I'll try it with -da
now.
Yes. You can check by looking at the file system_info.txt in $testname
@gigjozsa, I see no such file under my $testname folder:
oms@young:~/projects/caratekit/test1$ ls -lrt
total 24
drwxrwxr-x 6 oms oms 4096 Apr 4 19:33 meerkathi
drwxrwxr-x 7 oms oms 4096 Apr 22 20:37 caracal
drwxrwxr-x 7 oms oms 4096 Apr 22 20:37 caracal_venv
drwxrwxr-x 6 oms oms 4096 Apr 22 20:39 home
drwxrwxr-x 6 oms oms 4096 Apr 22 20:49 minimalConfig-docker
drwxrwxr-x 3 oms oms 4096 Apr 22 20:49 report
oms@young:~/projects/caratekit/test1$ ls -lrt */*txt
-rw-rw-r-- 1 oms oms 30 Apr 4 19:33 meerkathi/stimela_master.txt
-rw-rw-r-- 1 oms oms 95 Apr 4 19:33 meerkathi/stimela_last_stable.txt
-rw-rw-r-- 1 oms oms 54 Apr 22 20:37 caracal/stimela-master.txt
-rw-rw-r-- 1 oms oms 94 Apr 22 20:37 caracal/stimela-last_stable.txt
oms@young:~/projects/caratekit/test1$ ls -lrt */*/*txt
-rw-rw-r-- 1 oms oms 10 Apr 4 19:33 meerkathi/meerkathi.egg-info/top_level.txt
-rw-rw-r-- 1 oms oms 21805 Apr 4 19:33 meerkathi/meerkathi.egg-info/SOURCES.txt
-rw-rw-r-- 1 oms oms 177 Apr 4 19:33 meerkathi/meerkathi.egg-info/requires.txt
-rw-rw-r-- 1 oms oms 1 Apr 4 19:33 meerkathi/meerkathi.egg-info/dependency_links.txt
-rw-rw-r-- 1 oms oms 8 Apr 22 20:37 caracal/caracal.egg-info/top_level.txt
-rw-rw-r-- 1 oms oms 5385 Apr 22 20:37 caracal/caracal.egg-info/SOURCES.txt
-rw-rw-r-- 1 oms oms 177 Apr 22 20:37 caracal/caracal.egg-info/requires.txt
-rw-rw-r-- 1 oms oms 1 Apr 22 20:37 caracal/caracal.egg-info/dependency_links.txt
-rw-rw-r-- 1 oms oms 1457 Apr 22 20:43 report/minimalConfig-docker/minimalConfig-docker.yml.txt
-rw-rw-r-- 1 oms oms 1215 Apr 22 20:43 report/minimalConfig-docker/minimalConfig-docker.sh.txt
lrwxrwxrwx 1 oms oms 31 Apr 22 20:43 minimalConfig-docker/output/log-caracal.txt -> log-caracal-20200422-204318.txt
-rw-rw-r-- 1 oms oms 5966 Apr 22 20:43 minimalConfig-docker/input/mk64.txt
-rw-r--r-- 1 oms oms 4023 Apr 22 20:43 minimalConfig-docker/output/1477074305.subset-obsinfo.txt
-rw-r--r-- 1 oms oms 3544 Apr 22 20:43 minimalConfig-docker/output/1477074305.subset_cal-obsinfo.txt
-rw-r--r-- 1 oms oms 3274 Apr 22 20:48 minimalConfig-docker/output/1477074305.subset-IC5264_corr-obsinfo.txt
-rw-rw-r-- 1 oms oms 1026187 Apr 22 20:49 minimalConfig-docker/output/log-caracal-20200422-204318.txt
-rw-rw-r-- 1 oms oms 1026187 Apr 22 20:49 report/minimalConfig-docker/minimalConfig-docker-log-caracal.txt
-rw-rw-r-- 1 oms oms 1933 Apr 22 20:49 report/minimalConfig-docker/minimalConfig-docker-sysinfo.txt
oms@young:~/projects/caratekit/test1$
@o-smirnov That is rather disappointing. I'm running the same test but without -ws and -td as I have set them through the env, but am nowhere near the selfcal yet. I do get these wonderful meassages though
/Pipelines/Caracal/carate_test/test1/caracal_venv/lib/python3.6/site-packages/caracal/workers/inspect_data_worker.py:207: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
corrs = yaml.load(stdr)['CORR']['CORR_TYPE']
2020-04-22 21:09:04 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_ant'
2020-04-22 21:09:06 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_scan'
2020-04-22 21:09:09 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_ant'
2020-04-22 21:09:11 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_scan'
2020-04-22 21:09:14 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_ant'
2020-04-22 21:09:16 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_scan'
Yeah. I get those too, outside of the kit. Something to fix, but the pipeline does continue.
Yeah ignore those, I'm going to revamp the inspect worker anyway, now that shadems can plot the Universe versus God coloured by Eternity.
I'm more worried about my continuing test failures. Running with -da
(full test, right @gigjozsa?), I get this:
# --> CrashReporter initialized.
# 2020-04-22 19:17:56 INFO gaincal::::
# 2020-04-22 19:17:56 INFO gaincal::::+ ##########################################
# 2020-04-22 19:17:56 INFO gaincal::::+ ##### Begin Task: gaincal #####
# 2020-04-22 19:17:56 INFO gaincal:::: gaincal(vis="/stimela_mount/msdir/1477074305.subset_cal.ms",caltable="/stimela_mount/output/mypipelinerun-14
77074305.subset-1gc1_primary_cal.K0",field="PKS1934-638",spw="",intent="",
# 2020-04-22 19:17:56 INFO gaincal::::+ selectdata=True,timerange="",uvrange="",antenna="",scan="",
# 2020-04-22 19:17:56 INFO gaincal::::+ observation="",msselect="",solint="inf",combine="",preavg=-1.0,
# 2020-04-22 19:17:56 INFO gaincal::::+ refant="m010",refantmode="flex",minblperant=4,minsnr=3.0,solnorm=False,
# 2020-04-22 19:17:56 INFO gaincal::::+ normtype="mean",gaintype="K",smodel=[],calmode="ap",solmode="",
# 2020-04-22 19:17:56 INFO gaincal::::+ rmsthresh=[],append=False,splinetime=3600.0,npointaver=3,phasewrap=180.0,
# 2020-04-22 19:17:56 INFO gaincal::::+ docallib=False,callib="",gaintable=[],gainfield=[],interp=[],
# 2020-04-22 19:17:56 INFO gaincal::::+ spwmap=[],parang=False)
# 2020-04-22 19:17:56 INFO gaincal::calibrater::open ****Using NEW VI2-driven calibrater tool****
# 2020-04-22 19:17:56 INFO gaincal::calibrater::open Opening MS: /stimela_mount/msdir/1477074305.subset_cal.ms for calibration.
# 2020-04-22 19:17:56 INFO gaincal::Calibrater:: Initializing nominal selection to the whole MS.
# 2020-04-22 19:17:56 INFO gaincal:::: NB: gaincal automatically excludes auto-correlations.
# 2020-04-22 19:17:56 INFO calibrater::setdata Beginning selectvis--(MSSelection version)-------
# 2020-04-22 19:17:56 INFO calibrater::reset Reseting solve/apply state
# 2020-04-22 19:17:56 INFO Calibrater::selectvis Performing selection on MeasurementSet
# 2020-04-22 19:17:56 INFO Calibrater::selectvis+ Selecting on field: 'PKS1934-638'
# 2020-04-22 19:17:56 INFO Calibrater::selectvis+ Selecting with TaQL: 'ANTENNA1!=ANTENNA2'
# 2020-04-22 19:17:56 INFO Calibrater::selectvis By selection 4675 rows are reduced to 3195
# 2020-04-22 19:17:56 INFO Calibrater::selectvis Frequency selection: Selecting all channels in all spws.
# 2020-04-22 19:17:56 INFO calibrater::setdata chanmode=none nchan=1 start=0 step=1 mStart='0km/s' mStep='0km/s' msSelect='ANTENNA1!=ANTENNA2'
# 2020-04-22 19:17:56 INFO calibrater::setsolve Beginning setsolve--(MSSelection version)-------
# 2020-04-22 19:17:56 SEVERE Exception Reported: Antenna Expression: No match found for token(s) "m010"
# *** Error *** Antenna Expression: No match found for token(s) "m010"
# 2020-04-22 19:17:56 SEVERE gaincal:::: Error in gaincal: Antenna Expression: No match found for token(s) "m010"
# 2020-04-22 19:17:56 SEVERE gaincal:::: An error occurred running task gaincal.
# Traceback (most recent call last):
No such antenna m010, but from the other messages I suspect it's still using the subset MS. Maybe I didn't unpack rawdata.tar correctly. But how come it didn't fail completely? I thought -da
implies the full test -- if it doesn't see the full data, should it not fail?
Maybe because the test data isn't full pol?
No such antenna m010.
The test dataset doesn't have all the antennas, only half.
@KshitijT but shouldn't we detect an incorrect refant before the gaincal?
So, for me refant is set to '0' and it goes for 'm000' but I was using @gigjozsa 's test data. This seems to be IC5264 data (the very tiny one Ben shared). Let me recheck with the IC5264 data.
Maybe because the test data isn't full pol?
A-ha, now that's a thought! The offending code in CubiCal does do some funny shape calculations. Maybe it gets confused by the subset test data...
Anyway, I'd still like to understand from @gigjozsa how exactly it selects test data. I naively assumed -dm
takes the subset and -da
takes the full data, but it looks like I managed to start -da
on the subset somehow. Is there an implicit selection happening?
@o-smirnov The cubical crash happened in the first iteration of the first set?
Because I just got:
2020-04-22 21:36:03 CARACal.Stimela.calibrate_cubical_1_0 INFO: job complete at 2020-04-22 21:36:03.772850 after 0:03:36.917910
I noticed though that you call everything with the ~ which I avoid out of habit in bash.
test invoked with caratekit.sh -lc /home/peter/Pipelines/Caracal/caracal -ct test1 -dm -or -f
@PeterKamphuis can you check your log and tell me what the MS name(s) were? I suspect we're using different ones...
The ~ can't be a problem -- bash will substitute it before caratekit even sees it...
My MS was 1477074305.subset-IC5264_corr.ms
when CubiCal failed.
@o-smirnov ah no I do not have those.
get_data:
dataid: ['1524929477','1524947605','1532022061']
but these are not full polarization either so that should not be the issue for the crash. Unless somehow Caracal assumes your test set is full polarization while it isn't?
A-ha, now that's a thought! The offending code in CubiCal does do some funny shape calculations. Maybe it gets confused by the subset test data...
But that was for the test data I was using, not for yours, @o-smirnov . The one you are using is full pol.
EDIT: I was talking about the shadems errors.
But that was for the test data I was using, not for yours, @o-smirnov . The one you are using is full pol.
Maybe that is the problem then? I don't think I ever tested on full polarization data.
I'd still like to understand from @gigjozsa how exactly it selects test data
Sorry, did not get that before. With
-td directory
you supply the test data directory and any *.ms data set in that directory will be copied into the msdir of the test.
Then, the config file will edited and all mss found will be inserted into dataid.
So, whatever is in your directory will be processed. This means you have to take care that there is not more in that directory than you want to be processed. There is a way to suppress that behaviour, but it is not the default.
Ok, I just reproduced the error with a standalone test, so probably nothing to do with caratekit. I used minimal config and @o-smirnov's test dataset.
So, whatever is in your directory will be processed.
Yep, I was beginning to suspect that, thanks for confirming. I think that might be a little too clever. I think the default should be minimal test = subset data, full test = full data (or at least document the clever behaviour prominently up front -- as a naive user I definitely did not expect it!) In any case, I've discovered that the full test is not compatible with the subset data as things stand, so unless we change the full test config to be compatible with the subset data, we shouldn't let it even try.
So to summarize the current status:
-da
with full data is running now, and I'll report when it finishes.
-da
with subset data crashes in gaincal (for me) with unknown antenna m010
-dm
with full data works (for @PeterKamphuis), I haven't tried it yet
-dm
with subset data crashes in CubiCal (for me), could someone please try this for themselves?
Let's please discuss shadems errors in #972, this issue is already confusing enough
Ok, I just reproduced the error with a standalone test, so probably nothing to do with caratekit. I used minimal config and @o-smirnov's test dataset.
@KshitijT And standalone Cubical?
Ok, I just reproduced the error with a standalone test, so probably nothing to do with caratekit. I used minimal config and @o-smirnov's test dataset.
@KshitijT And standalone Cubical?
By standalone I meant a caracal run without caratekit.
@KshitijT And standalone Cubical?
By standalone I meant a caracal run without caratekit.
I understood but I meant can you try with a standalone cubical?
I'll do that too, it's a most curious error. But tomorrow. Going to go sulk in bed now. Thanks for the help, everyone.
@KshitijT And standalone Cubical?
By standalone I meant a caracal run without caratekit.
I understood but I meant can you try with a standalone cubical?
Ah. I'll try, let me first see if this is a data / config issue. Because the other test datasets went through the same caracal/stimela combo fine, but were done with meerkat default config. I am running @o-smirnov 's dataset through the meerkat default config now, that should tell us if it is the data or the config. After that, I'll try out with Cubical separately.
Ok, reproduced the same error with the meerkat default config, so maybe the data is to blame. Funny, we have used it so many times without any trouble before.
On to standalone cubical test...
Same error with a standalone Cubical test.
Okay, I untarred the dataset freshly, ran manual wsclean imaging (no pipeline involved) and then tried to calibrate it with the same parset. Same error.
So at this point the stuff before selfcal in the pipeline has nothing to do with the error (since the above dataset was not even not cross-calibrated). So either something is wrong with the cubical settings or the dataset is the culprit or something has changed in the cubical version since we last tested with this dataset.
Update: Calibration goes through with the default parset. So definitely something to do with cubical settings. EDIT: Actually a combination of the settings with this dataset, since other datasets are perfectly fine.
So this is the culprit: (https://github.com/ska-sa/caracal/blob/master/caracal/workers/self_cal_worker.py#L1153) and the corresponding line down for the interpolation:
"sel-diag": take_diag_terms,
where:
if matrix_type == 'Gain2x2':
take_diag_terms = False
else:
take_diag_terms = True
I set "sel-diag" : False,
and the pipeline goes through fine. I guess we should make this dependent on the data type?
Since the defaults are phase-diag etc. take_diag_terms
is by default set to True in the pipeline. I guess for full-pol data (like the one @o-smirnov used) these settings are not appreciated, while for the other test data which is just the diagonal correlations, it's fine.
Mystery solved?
@KshitijT Thank you for tracing this down. @o-smirnov Is this then a Cubical bug or a feature? I don't see immediately why for full polarization ignoring the off diagonal terms should not work. A second thing is whether it is the right thing to do for full polarization? I thought that we discussed that when not calibrating the full matrix it is better to set the off diagonal terms to 0. Maybe I misunderstood and that is only for when they are not present in the .ms?
Your error occurs on load as far as I could see so I don't think so, but I realized that madmax-offdiag
is not set and its default is 1. Could that cause the problems?
So, whatever is in your directory will be processed.
Yep, I was beginning to suspect that, thanks for confirming. I think that might be a little too clever. I think the default should be minimal test = subset data, full test = full data (or at least document the clever behaviour prominently up front -- as a naive user I definitely did not expect it!) In any case, I've discovered that the full test is not compatible with the subset data as things stand, so unless we change the full test config to be compatible with the subset data, we shouldn't let it even try.
So to summarize the current status:
* `-da` with full data is running now, and I'll report when it finishes. * `-da` with subset data crashes in gaincal (for me) with unknown antenna m010 * `-dm` with full data works (for @PeterKamphuis), I haven't tried it yet * `-dm` with subset data crashes in CubiCal (for me), could someone please try this for themselves?
Too much rope, I see, or rope too twisted. OK, for caratekit I will:
add a dialogue asking the user to confirm if she wants to run an unusual combination of data and -da, -sa, -sa, or -sm . The usual user would not use those switches anyway. Any security question will be overridden by the -or switch, including this one.
add a dialogue asking the user to confirm that dataid will be changed if it is not empty in any provided config file. Same thing, will not be done if -or is set.
enable a mode where only data are copied across which are in the dataid of the config file, which automatically implies the -kc switch to be set (which is the do-not-change-the-config-file switch)
@PeterKamphuis of course this is not a feature lol. Unless you consider perverse error messages as feature! Voila (and thanks @KshitijT for narrowing it down): https://github.com/ratt-ru/CubiCal/issues/368
enable a mode where only data are copied across which are in the dataid of the config file, which automatically implies the -kc switch to be set (which is the do-not-change-the-config-file switch)
Aha, I overlooked the -kc
switch. I think you should edit the "standard use cases" issue, and add the switch there, because we definitely want to be doing the standard PR tests with that switch in place.
In the meantime, with -dm
and full data far I got, far, but still on my face I fall:
2020-04-23 01:37:39 CARACal INFO: Corresponding pathnames are:
2020-04-23 01:37:39 CARACal INFO: ['output/cubes', 'output/cubes', 'output/cubes']
2020-04-23 01:37:39 CARACal INFO: output/cubes/cube_2/mypipelinerun_circinus_p2_HI.pb.fits is already in place, and will be used by montage_mosaic.
2020-04-23 01:37:39 CARACal INFO: output/cubes/cube_2/mypipelinerun_circinus_p3_HI.pb.fits is already in place, and will be used by montage_mosaic.
2020-04-23 01:37:39 CARACal INFO: output/cubes/cube_2/mypipelinerun_circinus_p1_HI.pb.fits is already in place, and will be used by montage_mosaic.
2020-04-23 01:37:39 CARACal INFO: Checking for *pb.fits files now complete.
2020-04-23 01:37:39 CARACal INFO: Now creating symlinks to images and beams, in case they are distributed across multiple subdirectories
2020-04-23 01:37:40 CARACal INFO: mosaic: running
2020-04-23 01:37:40 CARACal.Stimela.mosaic-steward INFO: job started at 2020-04-23 01:37:40.159662
# 829d2dc2df40c2dcfefd9690fd3032323b87a7b1e6adaff605d7ffaa5319b9c4
# Traceback (most recent call last):
# File "/stimela_mount/code/run.py", line 20, in <module>
# cab = utils.readJson(CONFIG)
# NameError: name 'utils' is not defined
2020-04-23 01:37:41 CARACal.Stimela.mosaic-steward ERROR: docker returns error code 1
2020-04-23 01:37:41 CARACal.Stimela.mosaic-steward ERROR: job failed at 2020-04-23 01:37:41.880430 after 0:00:01.720768
2020-04-23 01:37:41 CARACal ERROR: Job 'MosaicSteward:: Re-gridding spectral images before mosaicking them. For this mode, the mosaic_worker is using *pb.fi
ts files generated by the image_line_worker.' failed: docker returns error code 1 [PipelineException]
2020-04-23 01:37:41 CARACal INFO: More information can be found in the logfile at output/log-caracal-20200422-220253.txt
2020-04-23 01:37:41 CARACal INFO: exiting with error code 1
CARACal run carateConfig-docker returned an error.
Checking output of carateConfig docker test
@o-smirnov Well I meant more did I misunderstand the correct settings?
Just to have this clear, the setting in principle is correct? When we use gain-update-type: full
we want sel-diag: False
and whenever we have gain-update-type: xx-diag
we want sel-diag: True
. Also for full polarization?
I found the test subset dataset and try to make a work around for the bug but to be honest this dataset should crash the selfcal worker in anycase as the first image has no sources in it. I remember this is why I never used it for testing.
Just to have this clear, the setting in principle is correct? When we use gain-update-type: full we want sel-diag: False and whenever we have gain-update-type: xx-diag we want sel-diag: True. Also for full polarization?
In principle, yes (there may be some more exotic combinations for polcal etc.), the default selfcal behaviour should be as you describe.
Well I assume that for a polarization calibration you have to change heaps of settings as we also only produce stokes I images by default. Setting the matrix_type to Gain2x2 seems an obvious one. I'll tag @francescaLoi here as I understand she is looking at polarization calibration with Caracal.
So I made a temporary fix for the cubical bug in PR #990. @o-smirnov Do you want to keep this issue for this new error or should we close it?
Let's keep it open until the CubiCal bug is fixed and propagated into Stimela...
I give up on my -dm
test (with full data) after 9h of crystallballing... has this worked for anyone?
I'll try -da
over night.
Crystalball is an extremely time-consuming step. You really shouldn't try it on a large .MS unless you're interested in the science.
In my opinion there is no reason to test Caracal's PRs on large .MS files.
I've been running all my tests with all workers -- testing as many combinations of parameters as possible -- on my desktop machine using 3 ~200 MB .MS files, each with multiple targets. Typically, a Caracal run completes in ~1 h.
Holy shit, what did I do right?!!
If come from inside you, always right one.
Caratekit succeeded.
Can we please merge my PR quick quick before the stars un-align again?
In my opinion there is no reason to test Caracal's PRs on large .MS files.
Probably not, but then I'm confused. From @gigjozsa's diktat I understood we are to run caratekit fully before committing a PR. But the only combination that has worked for me so far has been -da
with full data. Clarity needed!
The directive was to go easy on the full tests while we are in a busy phase, and to use it to convince ourselves that all's good. We can even go easy on this by believing you and not requiring an independent test.
@paoloserra, the good news is that I'm running the selfcal worker! The bad news, it's only the caratekit test. The even worse news is, it crashes inside cubical, in an utterly incomprehensible manner:
My question is, how the hell does this work for anyone? Is anyone running the caratekit tests successfully? Can someone else please try, with Caracal master?
@gigjozsa, my command line was:
...I'm therefore testing my version of Caracal with (I assume) Stimela release, correct?
The code in question cannot possibly fail. It's absurd. But since @bennahugo put it in, I'm tagging him and @JSKenyon. The offending line cannot possibly fail with "Argument 'nparray' has to be a contiguous numpy array" (because we've just allocated the damn array, haven't we?) in any universe that I'm comfortable with.