Test errors with some combinations of caratekit test settings and test MSs

o-smirnov commented 4 years ago

@paoloserra, the good news is that I'm running the selfcal worker! The bad news, it's only the caratekit test. The even worse news is, it crashes inside cubical, in an utterly incomprehensible manner:

# INFO       - 16:31:42 - data_handler       [io] [0.1/0.1 4.7/4.7 0.0Gb] reading DATA
# INFO       - 16:31:42 - main               [io] [0.1/0.1 4.7/4.7 0.0Gb] I/O handler for load 0 save None failed with exception: Argument 'nparray' has to be a contiguous numpy array
# INFO       - 16:31:42 - main               [io] [0.1/0.1 4.7/4.7 0.0Gb] Traceback (most recent call last):
#   File "/usr/local/lib/python3.6/dist-packages/cubical/workers.py", line 461, in _io_handler
#     tile.load(load_model=load_model)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/data_handler/ms_tile.py", line 697, in load
#     obvis0 = self.dh.fetchslice(self.dh.data_column, subset=table_subset).astype(self.dh.ctype)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/data_handler/ms_data_handler.py", line 847, in fetchslice
#     subset.getcolslicenp(str(column), prealloc, self._ms_blc, self._ms_trc, self._ms_incr, startrow, nrows)
#   File "/usr/local/lib/python3.6/dist-packages/casacore/tables/table.py", line 1100, in getcolslicenp
#     + "numpy array")
# ValueError: Argument 'nparray' has to be a contiguous numpy array
# INFO       - 16:31:42 - main               [0.2/0.2 4.9/5.2 0.0Gb] Exiting with exception: ValueError(Argument 'nparray' has to be a contiguous numpy arra
y)
#  concurrent.futures.process._RemoteTraceback:
# """
# Traceback (most recent call last):
#   File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
#     r = call_item.fn(*call_item.args, **call_item.kwargs)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/workers.py", line 461, in _io_handler
#     tile.load(load_model=load_model)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/data_handler/ms_tile.py", line 697, in load
#     obvis0 = self.dh.fetchslice(self.dh.data_column, subset=table_subset).astype(self.dh.ctype)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/data_handler/ms_data_handler.py", line 847, in fetchslice
#     subset.getcolslicenp(str(column), prealloc, self._ms_blc, self._ms_trc, self._ms_incr, startrow, nrows)
#   File "/usr/local/lib/python3.6/dist-packages/casacore/tables/table.py", line 1100, in getcolslicenp
#     + "numpy array")
# ValueError: Argument 'nparray' has to be a contiguous numpy array
# """
# 
# The above exception was the direct cause of the following exception:
# 
# Traceback (most recent call last):
#   File "/usr/local/lib/python3.6/dist-packages/cubical/main.py", line 540, in main
#     stats_dict = workers.run_process_loop(ms, tile_list, load_model, single_chunk, solver_type, solver_opts, debug_opts, out_opts)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/workers.py", line 214, in run_process_loop
#     return _run_multi_process_loop(ms, load_model, solver_type, solver_opts, debug_opts, out_opts)
#   File "/usr/local/lib/python3.6/dist-packages/cubical/workers.py", line 274, in _run_multi_process_loop
#     if not io_futures[itile].result():
#   File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
#     return self.__get_result()
#   File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
#     raise self._exception
# ValueError: Argument 'nparray' has to be a contiguous numpy array

My question is, how the hell does this work for anyone? Is anyone running the caratekit tests successfully? Can someone else please try, with Caracal master?

@gigjozsa, my command line was:

~/projects/caracal/caratekit.sh -ws $workspace -td $testdata -lc $meerkathi -ct $testname -dm -or -f

...I'm therefore testing my version of Caracal with (I assume) Stimela release, correct?

The code in question cannot possibly fail. It's absurd. But since @bennahugo put it in, I'm tagging him and @JSKenyon. The offending line cannot possibly fail with "Argument 'nparray' has to be a contiguous numpy array" (because we've just allocated the damn array, haven't we?) in any universe that I'm comfortable with.

gigjozsa commented 4 years ago

I'm therefore testing my version of Caracal with (I assume) Stimela release, correct?

Yes. You can check by looking at the file system_info.txt in $testname

gigjozsa commented 4 years ago

My question is, how the hell does this work for anyone? I checked yesterday (I thought)

KshitijT commented 4 years ago

@o-smirnov , is this is the first round of selfcal?

KshitijT commented 4 years ago

For what it is worth, I ran through with caracal master, with stimela master (not release) and it goes through fine. Note that this is not with caratekit but a standalone test.

o-smirnov commented 4 years ago

Hmm, maybe I wasn't running the tests I thought I was running. I invoked caratekit like so:

caratekit.sh -ws $workspace -td $testdata -lc $meerkathi -ct $testname -dm -or -f

...but forgot to set testname. So all my output went to a directory called -dm, while the -dm option was, of course, ignored. Not sure what test that results in -- and doesn't explain the absolutely bizarre error message! -- but does explain why the rest of you don't see the same result perhaps.

PeterKamphuis commented 4 years ago

Hoezeeee, I was worried it was my new and wonderful invocation of cubical. Should we change the title then to caratekit.sh crashes when you invoke it with undeclared shell variables? Or just close this issue?

KshitijT commented 4 years ago

Let's wait until @o-smirnov confirms with the correct caratekit test?

o-smirnov commented 4 years ago

Should we change the title then to caratekit.sh crashes when you invoke it with undeclared shell variables?

Shell variables have nothing to do with it -- all I did was (in effect) invoke it with -ct -dm, i.e., with a somewhat funny (but entirely legit) output test directory name. It looks to have defaulted to a minimal test anyway. And crashed in cubical.

Running it with a boring test name results in the same crash:

~/projects/caracal/caratekit.sh -ws $workspace -td $testdata -lc ~/projects/caracal/ -ct test1 -dm -or -f

I'll try it with -da now.

o-smirnov commented 4 years ago

Yes. You can check by looking at the file system_info.txt in $testname

@gigjozsa, I see no such file under my $testname folder:

oms@young:~/projects/caratekit/test1$ ls -lrt
total 24
drwxrwxr-x 6 oms oms 4096 Apr  4 19:33 meerkathi
drwxrwxr-x 7 oms oms 4096 Apr 22 20:37 caracal
drwxrwxr-x 7 oms oms 4096 Apr 22 20:37 caracal_venv
drwxrwxr-x 6 oms oms 4096 Apr 22 20:39 home
drwxrwxr-x 6 oms oms 4096 Apr 22 20:49 minimalConfig-docker
drwxrwxr-x 3 oms oms 4096 Apr 22 20:49 report
oms@young:~/projects/caratekit/test1$ ls -lrt */*txt
-rw-rw-r-- 1 oms oms 30 Apr  4 19:33 meerkathi/stimela_master.txt
-rw-rw-r-- 1 oms oms 95 Apr  4 19:33 meerkathi/stimela_last_stable.txt
-rw-rw-r-- 1 oms oms 54 Apr 22 20:37 caracal/stimela-master.txt
-rw-rw-r-- 1 oms oms 94 Apr 22 20:37 caracal/stimela-last_stable.txt
oms@young:~/projects/caratekit/test1$ ls -lrt */*/*txt
-rw-rw-r-- 1 oms oms      10 Apr  4 19:33 meerkathi/meerkathi.egg-info/top_level.txt
-rw-rw-r-- 1 oms oms   21805 Apr  4 19:33 meerkathi/meerkathi.egg-info/SOURCES.txt
-rw-rw-r-- 1 oms oms     177 Apr  4 19:33 meerkathi/meerkathi.egg-info/requires.txt
-rw-rw-r-- 1 oms oms       1 Apr  4 19:33 meerkathi/meerkathi.egg-info/dependency_links.txt
-rw-rw-r-- 1 oms oms       8 Apr 22 20:37 caracal/caracal.egg-info/top_level.txt
-rw-rw-r-- 1 oms oms    5385 Apr 22 20:37 caracal/caracal.egg-info/SOURCES.txt
-rw-rw-r-- 1 oms oms     177 Apr 22 20:37 caracal/caracal.egg-info/requires.txt
-rw-rw-r-- 1 oms oms       1 Apr 22 20:37 caracal/caracal.egg-info/dependency_links.txt
-rw-rw-r-- 1 oms oms    1457 Apr 22 20:43 report/minimalConfig-docker/minimalConfig-docker.yml.txt
-rw-rw-r-- 1 oms oms    1215 Apr 22 20:43 report/minimalConfig-docker/minimalConfig-docker.sh.txt
lrwxrwxrwx 1 oms oms      31 Apr 22 20:43 minimalConfig-docker/output/log-caracal.txt -> log-caracal-20200422-204318.txt
-rw-rw-r-- 1 oms oms    5966 Apr 22 20:43 minimalConfig-docker/input/mk64.txt
-rw-r--r-- 1 oms oms    4023 Apr 22 20:43 minimalConfig-docker/output/1477074305.subset-obsinfo.txt
-rw-r--r-- 1 oms oms    3544 Apr 22 20:43 minimalConfig-docker/output/1477074305.subset_cal-obsinfo.txt
-rw-r--r-- 1 oms oms    3274 Apr 22 20:48 minimalConfig-docker/output/1477074305.subset-IC5264_corr-obsinfo.txt
-rw-rw-r-- 1 oms oms 1026187 Apr 22 20:49 minimalConfig-docker/output/log-caracal-20200422-204318.txt
-rw-rw-r-- 1 oms oms 1026187 Apr 22 20:49 report/minimalConfig-docker/minimalConfig-docker-log-caracal.txt
-rw-rw-r-- 1 oms oms    1933 Apr 22 20:49 report/minimalConfig-docker/minimalConfig-docker-sysinfo.txt
oms@young:~/projects/caratekit/test1$

PeterKamphuis commented 4 years ago

@o-smirnov That is rather disappointing. I'm running the same test but without -ws and -td as I have set them through the env, but am nowhere near the selfcal yet. I do get these wonderful meassages though

/Pipelines/Caracal/carate_test/test1/caracal_venv/lib/python3.6/site-packages/caracal/workers/inspect_data_worker.py:207: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  corrs = yaml.load(stdr)['CORR']['CORR_TYPE']
2020-04-22 21:09:04 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_ant'
2020-04-22 21:09:06 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_scan'
2020-04-22 21:09:09 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_ant'
2020-04-22 21:09:11 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_scan'
2020-04-22 21:09:14 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_ant'
2020-04-22 21:09:16 CARACal WARNING: The plotter 'shadems' cannot make the plot 'amp_scan'

paoloserra commented 4 years ago

Yeah. I get those too, outside of the kit. Something to fix, but the pipeline does continue.

o-smirnov commented 4 years ago

Yeah ignore those, I'm going to revamp the inspect worker anyway, now that shadems can plot the Universe versus God coloured by Eternity.

I'm more worried about my continuing test failures. Running with -da (full test, right @gigjozsa?), I get this:

# --> CrashReporter initialized.                                                                                            
# 2020-04-22 19:17:56   INFO    gaincal::::                                                                                                                 
# 2020-04-22 19:17:56   INFO    gaincal::::+    ##########################################
# 2020-04-22 19:17:56   INFO    gaincal::::+    ##### Begin Task: gaincal            #####                                    
# 2020-04-22 19:17:56   INFO    gaincal::::     gaincal(vis="/stimela_mount/msdir/1477074305.subset_cal.ms",caltable="/stimela_mount/output/mypipelinerun-14
77074305.subset-1gc1_primary_cal.K0",field="PKS1934-638",spw="",intent="",                                               
# 2020-04-22 19:17:56   INFO    gaincal::::+            selectdata=True,timerange="",uvrange="",antenna="",scan="", 
# 2020-04-22 19:17:56   INFO    gaincal::::+            observation="",msselect="",solint="inf",combine="",preavg=-1.0,
# 2020-04-22 19:17:56   INFO    gaincal::::+            refant="m010",refantmode="flex",minblperant=4,minsnr=3.0,solnorm=False,
# 2020-04-22 19:17:56   INFO    gaincal::::+            normtype="mean",gaintype="K",smodel=[],calmode="ap",solmode="",
# 2020-04-22 19:17:56   INFO    gaincal::::+            rmsthresh=[],append=False,splinetime=3600.0,npointaver=3,phasewrap=180.0,
# 2020-04-22 19:17:56   INFO    gaincal::::+            docallib=False,callib="",gaintable=[],gainfield=[],interp=[],
# 2020-04-22 19:17:56   INFO    gaincal::::+            spwmap=[],parang=False)                   
# 2020-04-22 19:17:56   INFO    gaincal::calibrater::open       ****Using NEW VI2-driven calibrater tool****
# 2020-04-22 19:17:56   INFO    gaincal::calibrater::open       Opening MS: /stimela_mount/msdir/1477074305.subset_cal.ms for calibration.                  
# 2020-04-22 19:17:56   INFO    gaincal::Calibrater::   Initializing nominal selection to the whole MS.
# 2020-04-22 19:17:56   INFO    gaincal::::     NB: gaincal automatically excludes auto-correlations.
# 2020-04-22 19:17:56   INFO    calibrater::setdata     Beginning selectvis--(MSSelection version)-------
# 2020-04-22 19:17:56   INFO    calibrater::reset       Reseting solve/apply state                         
# 2020-04-22 19:17:56   INFO    Calibrater::selectvis   Performing selection on MeasurementSet                                     
# 2020-04-22 19:17:56   INFO    Calibrater::selectvis+   Selecting on field: 'PKS1934-638'                                                                  
# 2020-04-22 19:17:56   INFO    Calibrater::selectvis+   Selecting with TaQL: 'ANTENNA1!=ANTENNA2'
# 2020-04-22 19:17:56   INFO    Calibrater::selectvis   By selection 4675 rows are reduced to 3195                                               
# 2020-04-22 19:17:56   INFO    Calibrater::selectvis   Frequency selection: Selecting all channels in all spws.
# 2020-04-22 19:17:56   INFO    calibrater::setdata     chanmode=none nchan=1 start=0 step=1 mStart='0km/s' mStep='0km/s' msSelect='ANTENNA1!=ANTENNA2'
# 2020-04-22 19:17:56   INFO    calibrater::setsolve    Beginning setsolve--(MSSelection version)-------
# 2020-04-22 19:17:56   SEVERE          Exception Reported: Antenna Expression: No match found for token(s) "m010"
# *** Error *** Antenna Expression: No match found for token(s) "m010"                    
# 2020-04-22 19:17:56   SEVERE  gaincal::::     Error in gaincal: Antenna Expression: No match found for token(s) "m010" 
# 2020-04-22 19:17:56   SEVERE  gaincal::::     An error occurred running task gaincal.                           
# Traceback (most recent call last):

No such antenna m010, but from the other messages I suspect it's still using the subset MS. Maybe I didn't unpack rawdata.tar correctly. But how come it didn't fail completely? I thought -da implies the full test -- if it doesn't see the full data, should it not fail?

KshitijT commented 4 years ago

Maybe because the test data isn't full pol?

KshitijT commented 4 years ago

No such antenna m010.

The test dataset doesn't have all the antennas, only half.

PeterKamphuis commented 4 years ago

@KshitijT but shouldn't we detect an incorrect refant before the gaincal?

KshitijT commented 4 years ago

So, for me refant is set to '0' and it goes for 'm000' but I was using @gigjozsa 's test data. This seems to be IC5264 data (the very tiny one Ben shared). Let me recheck with the IC5264 data.

o-smirnov commented 4 years ago

Maybe because the test data isn't full pol?

A-ha, now that's a thought! The offending code in CubiCal does do some funny shape calculations. Maybe it gets confused by the subset test data...

Anyway, I'd still like to understand from @gigjozsa how exactly it selects test data. I naively assumed -dm takes the subset and -da takes the full data, but it looks like I managed to start -da on the subset somehow. Is there an implicit selection happening?

PeterKamphuis commented 4 years ago

@o-smirnov The cubical crash happened in the first iteration of the first set?

Because I just got:

2020-04-22 21:36:03 CARACal.Stimela.calibrate_cubical_1_0 INFO: job complete at 2020-04-22 21:36:03.772850 after 0:03:36.917910

I noticed though that you call everything with the ~ which I avoid out of habit in bash.

PeterKamphuis commented 4 years ago

test invoked with caratekit.sh -lc /home/peter/Pipelines/Caracal/caracal -ct test1 -dm -or -f

o-smirnov commented 4 years ago

@PeterKamphuis can you check your log and tell me what the MS name(s) were? I suspect we're using different ones...

The ~ can't be a problem -- bash will substitute it before caratekit even sees it...

o-smirnov commented 4 years ago

My MS was 1477074305.subset-IC5264_corr.ms when CubiCal failed.

PeterKamphuis commented 4 years ago

@o-smirnov ah no I do not have those.

get_data:
  dataid: ['1524929477','1524947605','1532022061']

PeterKamphuis commented 4 years ago

but these are not full polarization either so that should not be the issue for the crash. Unless somehow Caracal assumes your test set is full polarization while it isn't?

KshitijT commented 4 years ago

A-ha, now that's a thought! The offending code in CubiCal does do some funny shape calculations. Maybe it gets confused by the subset test data...

But that was for the test data I was using, not for yours, @o-smirnov . The one you are using is full pol.

EDIT: I was talking about the shadems errors.

PeterKamphuis commented 4 years ago

But that was for the test data I was using, not for yours, @o-smirnov . The one you are using is full pol.

Maybe that is the problem then? I don't think I ever tested on full polarization data.

gigjozsa commented 4 years ago

I'd still like to understand from @gigjozsa how exactly it selects test data

Sorry, did not get that before. With

-td directory

you supply the test data directory and any *.ms data set in that directory will be copied into the msdir of the test.

Then, the config file will edited and all mss found will be inserted into dataid.

So, whatever is in your directory will be processed. This means you have to take care that there is not more in that directory than you want to be processed. There is a way to suppress that behaviour, but it is not the default.

KshitijT commented 4 years ago

Ok, I just reproduced the error with a standalone test, so probably nothing to do with caratekit. I used minimal config and @o-smirnov's test dataset.

o-smirnov commented 4 years ago

So, whatever is in your directory will be processed.

Yep, I was beginning to suspect that, thanks for confirming. I think that might be a little too clever. I think the default should be minimal test = subset data, full test = full data (or at least document the clever behaviour prominently up front -- as a naive user I definitely did not expect it!) In any case, I've discovered that the full test is not compatible with the subset data as things stand, so unless we change the full test config to be compatible with the subset data, we shouldn't let it even try.

So to summarize the current status:

-da with full data is running now, and I'll report when it finishes.
-da with subset data crashes in gaincal (for me) with unknown antenna m010
-dm with full data works (for @PeterKamphuis), I haven't tried it yet
-dm with subset data crashes in CubiCal (for me), could someone please try this for themselves?
Let's please discuss shadems errors in #972, this issue is already confusing enough

PeterKamphuis commented 4 years ago

Ok, I just reproduced the error with a standalone test, so probably nothing to do with caratekit. I used minimal config and @o-smirnov's test dataset.

@KshitijT And standalone Cubical?

KshitijT commented 4 years ago

Ok, I just reproduced the error with a standalone test, so probably nothing to do with caratekit. I used minimal config and @o-smirnov's test dataset.

@KshitijT And standalone Cubical?

By standalone I meant a caracal run without caratekit.

PeterKamphuis commented 4 years ago

@KshitijT And standalone Cubical?

By standalone I meant a caracal run without caratekit.

I understood but I meant can you try with a standalone cubical?

o-smirnov commented 4 years ago

I'll do that too, it's a most curious error. But tomorrow. Going to go sulk in bed now. Thanks for the help, everyone.

KshitijT commented 4 years ago

@KshitijT And standalone Cubical?

By standalone I meant a caracal run without caratekit.

I understood but I meant can you try with a standalone cubical?

Ah. I'll try, let me first see if this is a data / config issue. Because the other test datasets went through the same caracal/stimela combo fine, but were done with meerkat default config. I am running @o-smirnov 's dataset through the meerkat default config now, that should tell us if it is the data or the config. After that, I'll try out with Cubical separately.

KshitijT commented 4 years ago

Ok, reproduced the same error with the meerkat default config, so maybe the data is to blame. Funny, we have used it so many times without any trouble before.

On to standalone cubical test...

KshitijT commented 4 years ago

Same error with a standalone Cubical test.

KshitijT commented 4 years ago

Okay, I untarred the dataset freshly, ran manual wsclean imaging (no pipeline involved) and then tried to calibrate it with the same parset. Same error.

So at this point the stuff before selfcal in the pipeline has nothing to do with the error (since the above dataset was not even not cross-calibrated). So either something is wrong with the cubical settings or the dataset is the culprit or something has changed in the cubical version since we last tested with this dataset.

KshitijT commented 4 years ago

Update: Calibration goes through with the default parset. So definitely something to do with cubical settings. EDIT: Actually a combination of the settings with this dataset, since other datasets are perfectly fine.

KshitijT commented 4 years ago

So this is the culprit: (https://github.com/ska-sa/caracal/blob/master/caracal/workers/self_cal_worker.py#L1153) and the corresponding line down for the interpolation:

                "sel-diag": take_diag_terms,

where:

        if matrix_type == 'Gain2x2':
            take_diag_terms = False
        else:
            take_diag_terms = True

I set "sel-diag" : False, and the pipeline goes through fine. I guess we should make this dependent on the data type?

Since the defaults are phase-diag etc. take_diag_terms is by default set to True in the pipeline. I guess for full-pol data (like the one @o-smirnov used) these settings are not appreciated, while for the other test data which is just the diagonal correlations, it's fine.

Mystery solved?

PeterKamphuis commented 4 years ago

@KshitijT Thank you for tracing this down. @o-smirnov Is this then a Cubical bug or a feature? I don't see immediately why for full polarization ignoring the off diagonal terms should not work. A second thing is whether it is the right thing to do for full polarization? I thought that we discussed that when not calibrating the full matrix it is better to set the off diagonal terms to 0. Maybe I misunderstood and that is only for when they are not present in the .ms?

Your error occurs on load as far as I could see so I don't think so, but I realized that madmax-offdiag is not set and its default is 1. Could that cause the problems?

gigjozsa commented 4 years ago

So, whatever is in your directory will be processed.

Yep, I was beginning to suspect that, thanks for confirming. I think that might be a little too clever. I think the default should be minimal test = subset data, full test = full data (or at least document the clever behaviour prominently up front -- as a naive user I definitely did not expect it!) In any case, I've discovered that the full test is not compatible with the subset data as things stand, so unless we change the full test config to be compatible with the subset data, we shouldn't let it even try.

So to summarize the current status:
* `-da` with full data is running now, and I'll report when it finishes.

* `-da` with subset data crashes in gaincal (for me) with unknown antenna m010

* `-dm` with full data works (for @PeterKamphuis), I haven't tried it yet

* `-dm` with subset data crashes in CubiCal (for me), could someone please try this for themselves?
Too much rope, I see, or rope too twisted. OK, for caratekit I will:

add a dialogue asking the user to confirm if she wants to run an unusual combination of data and -da, -sa, -sa, or -sm . The usual user would not use those switches anyway. Any security question will be overridden by the -or switch, including this one.
add a dialogue asking the user to confirm that dataid will be changed if it is not empty in any provided config file. Same thing, will not be done if -or is set.
enable a mode where only data are copied across which are in the dataid of the config file, which automatically implies the -kc switch to be set (which is the do-not-change-the-config-file switch)

o-smirnov commented 4 years ago

@PeterKamphuis of course this is not a feature lol. Unless you consider perverse error messages as feature! Voila (and thanks @KshitijT for narrowing it down): https://github.com/ratt-ru/CubiCal/issues/368

enable a mode where only data are copied across which are in the dataid of the config file, which automatically implies the -kc switch to be set (which is the do-not-change-the-config-file switch)

Aha, I overlooked the -kc switch. I think you should edit the "standard use cases" issue, and add the switch there, because we definitely want to be doing the standard PR tests with that switch in place.

In the meantime, with -dm and full data far I got, far, but still on my face I fall:

2020-04-23 01:37:39 CARACal INFO: Corresponding pathnames are:
2020-04-23 01:37:39 CARACal INFO: ['output/cubes', 'output/cubes', 'output/cubes']
2020-04-23 01:37:39 CARACal INFO: output/cubes/cube_2/mypipelinerun_circinus_p2_HI.pb.fits is already in place, and will be used by montage_mosaic.
2020-04-23 01:37:39 CARACal INFO: output/cubes/cube_2/mypipelinerun_circinus_p3_HI.pb.fits is already in place, and will be used by montage_mosaic.
2020-04-23 01:37:39 CARACal INFO: output/cubes/cube_2/mypipelinerun_circinus_p1_HI.pb.fits is already in place, and will be used by montage_mosaic.
2020-04-23 01:37:39 CARACal INFO: Checking for *pb.fits files now complete.
2020-04-23 01:37:39 CARACal INFO: Now creating symlinks to images and beams, in case they are distributed across multiple subdirectories
2020-04-23 01:37:40 CARACal INFO: mosaic: running       
2020-04-23 01:37:40 CARACal.Stimela.mosaic-steward INFO: job started at 2020-04-23 01:37:40.159662
# 829d2dc2df40c2dcfefd9690fd3032323b87a7b1e6adaff605d7ffaa5319b9c4
# Traceback (most recent call last):        
#   File "/stimela_mount/code/run.py", line 20, in <module>
#     cab = utils.readJson(CONFIG)              
# NameError: name 'utils' is not defined
2020-04-23 01:37:41 CARACal.Stimela.mosaic-steward ERROR: docker returns error code 1
2020-04-23 01:37:41 CARACal.Stimela.mosaic-steward ERROR: job failed at 2020-04-23 01:37:41.880430 after 0:00:01.720768
2020-04-23 01:37:41 CARACal ERROR: Job 'MosaicSteward:: Re-gridding spectral images before mosaicking them. For this mode, the mosaic_worker is using *pb.fi
ts files generated by the image_line_worker.' failed: docker returns error code 1 [PipelineException]
2020-04-23 01:37:41 CARACal INFO:   More information can be found in the logfile at output/log-caracal-20200422-220253.txt
2020-04-23 01:37:41 CARACal INFO: exiting with error code 1
CARACal run carateConfig-docker returned an error.
Checking output of carateConfig docker test

PeterKamphuis commented 4 years ago

@o-smirnov Well I meant more did I misunderstand the correct settings?

Just to have this clear, the setting in principle is correct? When we use gain-update-type: full we want sel-diag: False and whenever we have gain-update-type: xx-diag we want sel-diag: True. Also for full polarization?

I found the test subset dataset and try to make a work around for the bug but to be honest this dataset should crash the selfcal worker in anycase as the first image has no sources in it. I remember this is why I never used it for testing.

o-smirnov commented 4 years ago

Just to have this clear, the setting in principle is correct? When we use gain-update-type: full we want sel-diag: False and whenever we have gain-update-type: xx-diag we want sel-diag: True. Also for full polarization?

In principle, yes (there may be some more exotic combinations for polcal etc.), the default selfcal behaviour should be as you describe.

PeterKamphuis commented 4 years ago

Well I assume that for a polarization calibration you have to change heaps of settings as we also only produce stokes I images by default. Setting the matrix_type to Gain2x2 seems an obvious one. I'll tag @francescaLoi here as I understand she is looking at polarization calibration with Caracal.

PeterKamphuis commented 4 years ago

So I made a temporary fix for the cubical bug in PR #990. @o-smirnov Do you want to keep this issue for this new error or should we close it?

o-smirnov commented 4 years ago

Let's keep it open until the CubiCal bug is fixed and propagated into Stimela...

o-smirnov commented 4 years ago

I give up on my -dm test (with full data) after 9h of crystallballing... has this worked for anyone?

I'll try -da over night.

paoloserra commented 4 years ago

Crystalball is an extremely time-consuming step. You really shouldn't try it on a large .MS unless you're interested in the science.

In my opinion there is no reason to test Caracal's PRs on large .MS files.

I've been running all my tests with all workers -- testing as many combinations of parameters as possible -- on my desktop machine using 3 ~200 MB .MS files, each with multiple targets. Typically, a Caracal run completes in ~1 h.

o-smirnov commented 4 years ago

Holy shit, what did I do right?!!

If come from inside you, always right one.
Caratekit succeeded.

Can we please merge my PR quick quick before the stars un-align again?

In my opinion there is no reason to test Caracal's PRs on large .MS files.

Probably not, but then I'm confused. From @gigjozsa's diktat I understood we are to run caratekit fully before committing a PR. But the only combination that has worked for me so far has been -da with full data. Clarity needed!

gigjozsa commented 4 years ago

The directive was to go easy on the full tests while we are in a busy phase, and to use it to convince ourselves that all's good. We can even go easy on this by believing you and not requiring an independent test.

caracal-pipeline / caracal

Test errors with some combinations of caratekit test settings and test MSs #984