caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
29 stars 6 forks source link

Caratekit standard config fails at modelsub step in image_line #891

Closed KshitijT closed 4 years ago

KshitijT commented 4 years ago

It does so with this message:

meerkathi - 2020-04-03 17:22:49,055 CRITICAL - Starting container [modelsub_0-14024305716868815859345639716818]. Timeout set to -1. The container ID is printed below.
meerkathi - 2020-04-03 17:22:49,056 CRITICAL - Running: docker start -a modelsub_0-14024305716868815859345639716818
Starting virtual X frame buffer: Xvfb.
meerkathi - 2020-04-03 17:22:50,667 CRITICAL - Successful read/write open of default-locked table /home/kthorat/msdir/1532022061-circinus_p3_l.ms: 25 columns, 92070 rows
meerkathi - 2020-04-03 17:22:50,667 CRITICAL - Wrtiting to column  CORRECTED_DATA (rows 0 to 9206)
meerkathi - 2020-04-03 17:22:50,667 CRITICAL - Traceback (most recent call last):
meerkathi - 2020-04-03 17:22:50,667 CRITICAL -   File "/scratch/code/run.py", line 70, in <module>
meerkathi - 2020-04-03 17:22:50,667 CRITICAL -     run_func(**_args)
meerkathi - 2020-04-03 17:22:50,667 CRITICAL -   File "/usr/local/lib/python3.6/dist-packages/MSUtils/msutils.py", line 244, in sumcols
meerkathi - 2020-04-03 17:22:50,667 CRITICAL -     data = tab_spw.getcol(col1, row0, nr) - tab_spw.getcol(col2, row0, nr)
meerkathi - 2020-04-03 17:22:50,667 CRITICAL -   File "/usr/local/lib/python3.6/dist-packages/casacore/tables/table.py", line 1031, in getcol
meerkathi - 2020-04-03 17:22:50,668 CRITICAL -     return self._getcol(columnname, startrow, nrow, rowincr)
meerkathi - 2020-04-03 17:22:50,668 CRITICAL - RuntimeError: TableProxy::getColumn: column MODEL_DATA does not exist
meerkathi - 2020-04-03 17:22:53,064 CRITICAL - INFO:STIMELA-15:Recipe execution failed while running job modelsub_0
meerkathi - 2020-04-03 17:22:53,064 CRITICAL - INFO:STIMELA-15:Completed jobs : []
meerkathi - 2020-04-03 17:22:53,064 CRITICAL - INFO:STIMELA-15:Remaining jobs : ['mstransform_0', 'listobs_0', 'summary_json_0', 'flag_mst_errors']
meerkathi - 2020-04-03 17:22:53,064 CRITICAL - INFO:STIMELA-15:Logging remaining task: mstransform_0:: Doppler tracking corrections
meerkathi - 2020-04-03 17:22:53,065 CRITICAL - INFO:STIMELA-15:Logging remaining task: listobs_0:: Get observation information ms=1532022061-circinus_p3_l_mst.ms
meerkathi - 2020-04-03 17:22:53,065 CRITICAL - INFO:STIMELA-15:Logging remaining task: summary_json_0:: Get observation information as a json file ms=1532022061-circinus_p3_l_mst.ms
meerkathi - 2020-04-03 17:22:53,065 CRITICAL - INFO:STIMELA-15:Logging remaining task: flag_mst_errors:: file ms=1532022061-circinus_p3_l_mst.ms
meerkathi - 2020-04-03 17:22:53,065 CRITICAL - INFO:STIMELA-15:Saving pipeline information in .last_20200403-1619_make_line_cube.json
meerkathi - 2020-04-03 17:22:53,066 ERROR - An unhandled exeption occured. If you think this is a bug please report it.
meerkathi - 2020-04-03 17:22:53,066 ERROR - Your logfile is here: /home/kthorat/last_day_testing/chunky_test/test_carateConfig_docker/log-meerkathi.txt.
meerkathi - 2020-04-03 17:22:53,066 ERROR - You are running version: 0.3.0
meerkathi - 2020-04-03 17:22:53,067 ERROR - Traceback (most recent call last):
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 1056, in run
    run_job()
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 125, in run_docker_job
    self.job.start()
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/docker.py", line 169, in start
    kill_callback=lambda: utils.xrun("docker", ["kill", self.name]))
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/utils/__init__.py", line 181, in xrun
    '%s: returns errr code %d' % (command, process.returncode))
stimela.utils.StimelaCabRuntimeError: docker: returns errr code 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/meerkathi/main.py", line 166, in __run
    pipeline.run_workers()
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/meerkathi/workers/worker_administrator.py", line 297, in run_workers
    worker.worker(self, recipe, config)
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/meerkathi/workers/image_line_worker.py", line 473, in worker
    recipe.run()
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 1084, in run
    raise_(pe, None, sys.exc_info()[2])
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/future/utils/__init__.py", line 440, in raise_
    raise exc.with_traceback(tb)
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 1056, in run
    run_job()
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 125, in run_docker_job
    self.job.start()
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/docker.py", line 169, in start
    kill_callback=lambda: utils.xrun("docker", ["kill", self.name]))
  File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/utils/__init__.py", line 181, in xrun
    '%s: returns errr code %d' % (command, process.returncode))
stimela.exceptions.PipelineException: Exception occurred while running pipeline component 'modelsub_0:: Subtract model column': docker: returns errr code 1

meerkathi - 2020-04-03 17:22:53,067 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 1056, in run
meerkathi - 2020-04-03 17:22:53,067 CRITICAL -     run_job()
meerkathi - 2020-04-03 17:22:53,067 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 125, in run_docker_job
meerkathi - 2020-04-03 17:22:53,067 CRITICAL -     self.job.start()
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/docker.py", line 169, in start
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -     kill_callback=lambda: utils.xrun("docker", ["kill", self.name]))
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/utils/__init__.py", line 181, in xrun
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -     '%s: returns errr code %d' % (command, process.returncode))
meerkathi - 2020-04-03 17:22:53,068 CRITICAL - stimela.utils.StimelaCabRuntimeError: docker: returns errr code 1
meerkathi - 2020-04-03 17:22:53,068 CRITICAL - 
meerkathi - 2020-04-03 17:22:53,068 CRITICAL - During handling of the above exception, another exception occurred:
meerkathi - 2020-04-03 17:22:53,068 CRITICAL - 
meerkathi - 2020-04-03 17:22:53,068 CRITICAL - Traceback (most recent call last):
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/meerkathi/main.py", line 166, in __run
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -     pipeline.run_workers()
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/meerkathi/workers/worker_administrator.py", line 297, in run_workers
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -     worker.worker(self, recipe, config)
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/meerkathi/workers/image_line_worker.py", line 473, in worker
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -     recipe.run()
meerkathi - 2020-04-03 17:22:53,068 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 1084, in run
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -     raise_(pe, None, sys.exc_info()[2])
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/future/utils/__init__.py", line 440, in raise_
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -     raise exc.with_traceback(tb)
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 1056, in run
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -     run_job()
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/recipe.py", line 125, in run_docker_job
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -     self.job.start()
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/docker.py", line 169, in start
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -     kill_callback=lambda: utils.xrun("docker", ["kill", self.name]))
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -   File "/home/kthorat/last_day_testing/chunky_test/caracal_venv/lib/python3.5/site-packages/stimela/utils/__init__.py", line 181, in xrun
meerkathi - 2020-04-03 17:22:53,069 CRITICAL -     '%s: returns errr code %d' % (command, process.returncode))
meerkathi - 2020-04-03 17:22:53,069 CRITICAL - stimela.exceptions.PipelineException: Exception occurred while running pipeline component 'modelsub_0:: Subtract model column': docker: returns errr code 1
meerkathi - 2020-04-03 17:22:53,146 INFO - PIPELINER EXITS WITH RETURN CODE 1
CARACal run returned an error.
Checking output of carateConfig docker test

The relevant bit of carateconfig is :

split_target__3:
  enable: true
  label_in: 'corr'
  label_out: 'l'
  split_target:
    enable: true
    column: 'all'
#    spw: '*:1370~1390MHz'
    otfcal:
      enable: false

image_line:
  enable: true
  label: 'l'
  restfreq: '1.420405752GHz'
  subtractmodelcol:
    enable: true
  make_cube:
    enable: true
    npix: [2400]
    cell: 3
    taper: 10
    robust: 0
  mstransform:
    enable: true
    doppler:
      enable: true
      telescope: meerkat
    uvlin:
      enable: true
      fitorder: 3
      fitspw: '*:1300~1417.2MHz,*:1421.0~1600MHz'
  pb_cube:
    enable: true
  flag_mst_errors:
     enable: true
  freq_to_vel:
    enable: true
  remove_stokes_axis:
    enable: true
  sofia:
    enable: true

I suspect this happens because even with column: 'all', mstransform does not copy over the MODEL_DATA column, modelsub tries to subtract the nonexistent MODEL_DATA column from the CORRECTED_DATA and crashes. See:

meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::parseMsSpecParams   Input file name is /home/kthorat/msdir/1532022061-circinus_p3_corr.ms
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::parseMsSpecParams   Data column is ALL
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::parseMsSpecParams   Output file name is /home/kthorat/msdir/1532022061-circinus_p3_l.ms
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::parseMsSpecParams   Re-index is enabled
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::parseMsSpecParams   WEIGHT_SPECTRUM will be written in output MS
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::parseMsSpecParams   Tile shape is [0]
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::parseDataSelParams  field selection is circinus-p3
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::colCheckInfo        Adding DATA column to output MS from input DATA column
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::colCheckInfo        Adding CORRECTED_DATA column to output MS from input CORRECTED_DATA column
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::initDataSelectionParams     Selected Fields Ids are [0]
meerkathi - 2020-04-03 17:22:02,722 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::open        Select data
meerkathi - 2020-04-03 17:22:02,723 CRITICAL - 2020-04-03 17:22:03      INFO    MSTransformManager::createOutputMSStructure     Create output MS structure
meerkathi - 2020-04-03 17:22:06,929 CRITICAL - 2020-04-03 17:22:07      INFO    MSTransformManager::checkFillWeightSpectrum     Optional column WEIGHT_SPECTRUM found in input MS will be written to output 
MS
meerkathi - 2020-04-03 17:22:06,929 CRITICAL - 2020-04-03 17:22:07      INFO    mstransform:::: Apply the transformations
meerkathi - 2020-04-03 17:22:10,736 CRITICAL - 2020-04-03 17:22:11      INFO    mstransform:::: CASA Version 5.6.1-8
meerkathi - 2020-04-03 17:22:10,737 CRITICAL - 2020-04-03 17:22:11      INFO    mstransform::::
meerkathi - 2020-04-03 17:22:10,937 CRITICAL - 2020-04-03 17:22:11      INFO    mstransform:::: ##### End Task: mstransform          #####

Of course this does not happen with `stanard' usage of the pipeline, since there is no intervening split step between selfcal and line_image workers.

Has anyone else encountered the same error?

The solutions, if the code is to be changed, might be:

  1. A run of msutils which will copy over the model_data to the relevant ms or
  2. (This is what I would like) move crystalball runs to image_line worker.

So I don't know if this constitutes a bug. Up to @paoloserra and @gigjozsa if to amend Caratekit config or to change the code.

gigjozsa commented 4 years ago

It did not do this before...

gigjozsa commented 4 years ago

I will test this and hopefully not reproduce. Has the implementation of mstransform in stimela changed during this week (e.g. were we using 4.something before and not now?)

KshitijT commented 4 years ago

I don't think so; please see if it is reproducible. I don't know why I did not encounter this earlier - maybe I just used the minimal or meerkat configs.

paoloserra commented 4 years ago

I will also look into it today.

gigjozsa commented 4 years ago

@KshitijT @paoloserra I started this yesterday fresh from master. Had to edit carateConfig.yml for keywords clean_mask_threshold: [20,10,5,5] clean_threshold: [0.5,0.5,0.5,0.5] in selfcal worker. I am not finished yet because I want to let it run through and see if I can confirm a report of stripes reappearing, but what I got so far is:

Starting container [modelsub_0-14029794096062415859569100829802]. Timeout set to -1. The container ID is printed below.
Running: docker start -a modelsub_0-14029794096062415859569100829802
Successful read/write open of default-locked table /home/jozsa/msdir/1524929477-circinus_p1_l.ms: 26 columns, 47616 rows
Wrtiting to column  CORRECTED_DATA (rows 0 to 4760)
Wrtiting to column  CORRECTED_DATA (rows 4761 to 9521)
Wrtiting to column  CORRECTED_DATA (rows 9522 to 14282)
Wrtiting to column  CORRECTED_DATA (rows 14283 to 19043)
Wrtiting to column  CORRECTED_DATA (rows 19044 to 23804)
Wrtiting to column  CORRECTED_DATA (rows 23805 to 28565)
Wrtiting to column  CORRECTED_DATA (rows 28566 to 33326)
Wrtiting to column  CORRECTED_DATA (rows 33327 to 38087)
Wrtiting to column  CORRECTED_DATA (rows 38088 to 42848)
Wrtiting to column  CORRECTED_DATA (rows 42849 to 47609)
Wrtiting to column  CORRECTED_DATA (rows 47610 to 47615)
Container [modelsub_0-14029794096062415859569100829802] has executed successfully
Runtime was 0:00:02.083625.

Will give full report when finished. Should be soon.

gigjozsa commented 4 years ago

It definitely went through, now mosaicing.

gigjozsa commented 4 years ago

@KshitijT Is this happening with docker or singularity? I tested with Docker.

KshitijT commented 4 years ago

Docker.

On Sat, Apr 4, 2020, 11:16 Gyula Jozsa (Josh) notifications@github.com wrote:

@KshitijT https://github.com/KshitijT Is this happening with docker or singularity? I tested with Docker.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ska-sa/meerkathi/issues/891#issuecomment-609000772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSY6U2LR5UQZOJSMSKXBTRK33G7ANCNFSM4L4KPADA .

KshitijT commented 4 years ago

Hold on, I see what happened.

While testing #888 , I turned off transfer_model, which means self_cal was done on MSs with label "cont" but the model was never passed on to the ones with label "corr" (which did have the corrected data column though, which was duly generated by transfer_apply_gains.

...Which is why the modelsub complains in image_line. Apologies @gigjozsa and @paoloserra , for wasting your time.

Whatever happened to the emoji for "feeling stupid" ?