caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

Issues with obsconf #1546

Open Koketso01 opened 11 months ago

Koketso01 commented 11 months ago

I keep stumbling into the following error with the latest version of Caracal, however, when I use the earlier version of Caracal, I don't get the same error (I think it is the obconf worker):

Successful readonly open of default-locked table /stimela_mount/msdir/j025740-220946_1_h.ms::POLARIZATION: 4 columns, 1 rows

2023-10-23 13:56:28 CARACal.Stimela.summary_json-ms0 ERROR: /usr/local/bin/singularity run --workdir /home/mophahlane/caracal_data/MACj0247/.stimela_workdir-16980621375234609 --containall --userns returns error code 1 2023-10-23 13:56:28 CARACal.Stimela.summary_json-ms0 ERROR: job failed at 2023-10-23 13:56:28.823588 after 0:00:28.855668 2023-10-23 13:56:28 CARACal ERROR: Job 'summary_json-ms0:: Get observation information as a json file ms=j025740-220946_1_h.ms' failed: /usr/local/bin/singularity run --workdir /home/mophahlane/caracal_data/MACj0247/.stimela_workdir-16980621375234609 --containall --userns returns error code 1 [PipelineException] 2023-10-23 13:56:28 CARACal INFO: More information can be found in the logfile at /home/mophahlane/caracal_data/MACj0247/output/logs-20231023-135536/log-caracal.txt 2023-10-23 13:56:28 CARACal INFO: exiting with error code 1 log-caracal.txt

SpaceMeerkat commented 11 months ago

I am also seeing this issue for CARACal runs. I have a rolled back version (version 1.0.6) as I was seeing similar issues for the latest version a while back. For some reason they're now occurring with the older version too.

o-smirnov commented 11 months ago

@Athanaseus could you take a look please? @Koketso01, @SpaceMeerkat, which machines are you running on, which working directories, which virtual environments?

SpaceMeerkat commented 11 months ago

Machine: Janis Working directories (venv, snigularity images, output directory):

Koketso01 commented 11 months ago

I use an older version in jake, and it works just fine, just that I have limited space to carry out the complete pipeline run.

I'm having this issue with the latest version in janis rawdata directory: /net/sinatra/vault2-ike/ianja/MGCLS_DATA/MACS-J0257.6-2209-0

working directory: /net/sinatra/vault-janis/mophahlane

On Tue, Oct 24, 2023 at 11:29 AM Oleg Smirnov @.***> wrote:

@Athanaseus https://github.com/Athanaseus could you take a look please? @Koketso01 https://github.com/Koketso01, @SpaceMeerkat https://github.com/SpaceMeerkat, which machines are you running on, which working directories, which virtual environments?

— Reply to this email directly, view it on GitHub https://github.com/caracal-pipeline/caracal/issues/1546#issuecomment-1776852089, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOQZHHIFHBKXJRKW5O64UA3YA6DAFAVCNFSM6AAAAAA6L6U7WWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZWHA2TEMBYHE . You are receiving this because you were mentioned.Message ID: @.***>

SpaceMeerkat commented 11 months ago

Just adding to this... using the same rolled back version 1.0.6 also works on Ike. So it must be something to do with janis rather than the caracal distro

o-smirnov commented 11 months ago

I am stumped. Looking at @SpaceMeerkat's case in particular -- the transform worker runs, splits out the MS, then there's the last stage of the recipe which generates a summary.json file for the new MS:

2023-10-26 17:52:44 CARACal.Stimela INFO: Parameters validated and saved to /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862/stimela_parameter_files/summary_json_ms0_
0-14062544364164816983324599434216.json                                                                                                                                                                          
2023-10-26 17:52:44 CARACal.Stimela.summary_json-ms0-0 INFO: Starting container [summary_json_ms0_0-14062544364164816983324599434216]. Timeout set to -1. The container ID is printed below.                     
# running cd /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862 && singularity run --workdir /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-1
6983324577896862 --containall  --bind /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862/stimela_parameter_files/summary_json_ms0_0-14062544364164816983324599434216.jso
n:/stimela_mount/configfile:ro --bind /net/sinatra/vault-janis/dawsonj5/caracal-new/lib/python3.8/site-packages/stimela/cargo/cab/msutils/src:/stimela_mount/code:ro --bind /net/sinatra/vault-janis/dawsonj5/MGC
LS/new/J0600.8-5835/.stimela_workdir-16983324577896862/passwd:/etc/passwd:rw --bind /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862/group:/etc/group:rw --bind /net/s
inatra/vault-janis/dawsonj5/caracal-new/bin/stimela_runscript:/singularity:ro --bind /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/msdir:/stimela_mount/msdir:rw --bind /net/sinatra/vault-janis/dawso
nj5/MGCLS/new/J0600.8-5835/input:/stimela_mount/input:ro --bind /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/output/obsinfo:/stimela_mount/output:rw --bind /net/sinatra/vault-janis/dawsonj5/MGCLS/n
ew/J0600.8-5835/output/obsinfo/tmp:/stimela_mount/output/tmp:rw /net/sinatra/vault-janis/dawsonj5/sin-images/stimela_msutils_1.4.6.sif /singularity                                                              
# WARNING: Overriding HOME environment variable with SINGULARITYENV_HOME is not permitted                                                                                                                        
# Successful readonly open of default-locked table /stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg.ms: 26 columns, 6177897 rows                                                                 
# Successful readonly open of default-locked table /stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg.ms::FIELD: 9 columns, 1 rows                                                                 
# Successful readonly open of default-locked table /stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg.ms::SPECTRAL_WINDOW: 14 columns, 1 rows                                                      
# Successful readonly open of default-locked table /stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg.ms::ANTENNA: 8 columns, 61 rows                                                              
# Successful readonly open of default-locked table /stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg.ms::STATE: 7 columns, 4 rows                                                                 
# Successful readonly open of default-locked table /stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg.ms::POLARIZATION: 4 columns, 1 rows                                                          
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR: cd /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862 && singularity run --workdir /net/sinatra/vault-jani
s/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862 --containall returns error code 1                                                                                                           
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR: job failed at 2023-10-26 17:52:57.040826 after 0:00:12.051965                                                                                      
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR: Traceback (most recent call last):                                                                                                                 
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:   File "/net/sinatra/vault-janis/dawsonj5/caracal-new/lib/python3.8/site-packages/stimela/recipe.py", line 713, in run                             
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:     job.run_job()                                                                                                                                  
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:   File "/net/sinatra/vault-janis/dawsonj5/caracal-new/lib/python3.8/site-packages/stimela/recipe.py", line 425, in run_job                         
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:     self.job.run(output_wrangler=self.apply_output_wranglers)                                                                                      
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:   File "/net/sinatra/vault-janis/dawsonj5/caracal-new/lib/python3.8/site-packages/stimela/singularity.py", line 123, in run                        
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:     utils.xrun(f"cd {self.execdir} && singularity run --workdir {self.execdir} --containall",                                                      
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:   File "/net/sinatra/vault-janis/dawsonj5/caracal-new/lib/python3.8/site-packages/stimela/utils/xrun_poll.py", line 227, in xrun                   
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR:     raise StimelaCabRuntimeError("{} returns error code {}".format(command_name, status))                                                          
2023-10-26 17:52:57 CARACal.Stimela.summary_json-ms0-0 ERROR: stimela.utils.StimelaCabRuntimeError: cd /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862 && singularity
 run --workdir /net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/.stimela_workdir-16983324577896862 --containall returns error code 1                                                                     
2023-10-26 17:52:57 CARACal.Stimela.transform__avg INFO: Completed jobs : ['split_field-ms0-0', 'save-caracal_legacy-ms0', 'listobs-ms0-0']                                                                      
2023-10-26 17:52:57 CARACal.Stimela.transform__avg INFO: Remaining jobs : []                                                                                                                                     
2023-10-26 17:52:57 CARACal.Stimela.transform__avg INFO: Saving pipeline information in .last_transform__avg.json                                                                                                

The JSON file is generated and appears to be fine, there are no additional error messages -- just that exit code of 1, seemingly out of nowhere apropos of nothing.

I have repeated the appropriate msutils.summary() call by hand (outside of the container), and that works fine as well.

The junk field of the parameters is empty, so the cleanup here should be a no-op.

Here's the complete stimela parameter file for reference:

{
    'task': 'msutils',
    'base': 'stimela/msutils',
    'binary': 'msutils',
    'msdir': '/net/sinatra/vault-janis/dawsonj5/MGCLS/new/J0600.8-5835/msdir',
    'description': 'Tools for manipulating measurement sets (MSs)',
    'prefix': ' ',
    'tag': ['1.4.6'],
    'version': ['1.0.1'],
    'junk': [],
    'wranglers': [],
    'parameters': [
        {'name': 'command', 'dtype': 'str', 'info': 'MSUtils command to execute', 'required': True, 'positional': False, 'check_io': True, 'value': 'summary'},
        {'name': 'msname', 'dtype': 'file', 'info': 'MS name', 'required': False, 'positional': False, 'check_io': True, 'value': '/stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg.ms'},
        {'name': 'colname', 'dtype': 'str', 'info': 'Column name', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {
            'name': 'outfile',
            'dtype': 'file',
            'info': 'Output file for MS summary (json format)',
            'required': False,
            'positional': False,
            'check_io': False,
            'value': '/stimela_mount/msdir/j060048-583514_0_h-J0600_8_5835-corrfreqavg-summary.json'
        },
        {'name': 'display', 'dtype': 'bool', 'info': 'Display MS summary to stdout', 'required': False, 'positional': False, 'check_io': True, 'value': False},
        {'name': 'shape', 'dtype': 'str', 'info': 'Shape of column to add to MS', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'valuetype', 'dtype': 'str', 'info': 'Column data type', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'data_desc_type', 'dtype': 'str', 'info': 'Data description type for data in column to be added', 'required': False, 'positional': False, 'check_io': True, 'value': 'array'},
        {'name': 'init_with', 'dtype': 'float', 'info': 'Value to initialize new data column with', 'required': False, 'positional': False, 'check_io': True, 'value': True},
        {'name': 'col1', 'dtype': 'str', 'info': 'First column to add/subtract', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'col2', 'dtype': 'str', 'info': 'Second column to add/subtract', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'cols', 'dtype': 'list:str', 'info': 'Columns to sum', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'subtract', 'dtype': 'bool', 'info': "Subtract 'col2' from 'col1' ", 'required': False, 'positional': False, 'check_io': True, 'value': False},
        {'name': 'fromcol', 'dtype': 'str', 'info': 'Column to copy data from', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'tocol', 'dtype': 'str', 'info': 'Column to copy data to', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'addnoise', 'dtype': 'bool', 'info': "Add noise to MS. Will add to 'column/colname'", 'required': False, 'positional': False, 'check_io': True, 'value': False},
        {
            'name': 'sefd',
            'dtype': 'float',
            'info': 'System Equivalent Flux Density, in Jy. The noise will be calculated using this value',
            'required': False,
            'positional': False,
            'check_io': True,
            'value': 0
        },
        {'name': 'addToCol', 'dtype': 'str', 'info': 'Add noise to data in this column', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'noise', 'dtype': 'float', 'info': "Noise in Jy to 'column/colname' data in Jy", 'required': False, 'positional': False, 'check_io': True, 'value': 0},
        {'name': 'spw_id', 'dtype': 'int', 'info': 'SPW ID', 'required': False, 'positional': False, 'check_io': True, 'value': 0},
        {
            'name': 'verify',
            'dtype': 'bool',
            'info': 'Verifies antenna Y positions in MS. If Y coordinate convention is wrong, either fixes the positions (fix=True) or raises an error. hemisphere=-1 makes it assume that the observatory is in 
the Western hemisphere, hemisphere=1 in the Eastern, or else tries to find observatory name using MS and pyrap.measure',
            'required': False,
            'positional': False,
            'check_io': True,
            'value': True
        },
        {
            'name': 'mode',
            'dtype': 'str',
            'info': 'Mode when estimating spectral weights. If mode=specs, then the weights will be based on the instrument spec sensitivity that is provided via the stats_data option',
            'required': False,
            'positional': False,
            'check_io': True,
            'value': 'specs'
        },
        {'name': 'fit_order', 'dtype': 'int', 'info': 'Fit order for function used to smooth noise/weights', 'required': False, 'positional': False, 'check_io': True, 'value': 9},
        {'name': 'smooth', 'dtype': 'str', 'info': 'Function to use for smoothing the noise/weights', 'required': False, 'positional': False, 'check_io': True, 'value': 'polyn'},
        {
            'name': 'stats_data',
            'dtype': 'list/file/str',
            'info': "File or array containing information about sensitivity as a function of frequency (in Hz). For MeerKAT use the string 'use_package_meerkat_spec' unless you have your own (updated) specs",
            'required': False,
            'positional': False,
            'check_io': False,
            'value': 'use_package_meekat_spec'
        },
        {'name': 'plot_stats', 'dtype': 'file', 'info': 'Plot of estimated spectral noise/weights', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'write_to_ms', 'dtype': 'bool', 'info': 'Save estimated noise/weights in MS', 'required': False, 'positional': False, 'check_io': True, 'value': True},
        {
            'name': 'noise_columns',
            'dtype': 'list:str',
            'info': 'columns to save noise and corresponding noise spectrum',
            'required': False,
            'positional': False,
            'check_io': True,
            'value': ['SIGMA', 'SIGMA_SPECTRUM']
        },
        {
            'name': 'weight_columns',
            'dtype': 'list:str',
            'info': 'columns to save noise and corresponding noise spectrum',
            'required': False,
            'positional': False,
            'check_io': True,
            'value': ['WEIGHT', 'WEIGHT_SPECTRUM']
        },
        {'name': 'ctable', 'dtype': 'file', 'info': 'Calibration table to plot', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'tabtype', 'dtype': 'str', 'info': 'Type of the calibration table', 'required': False, 'positional': False, 'check_io': True, 'value': None},
        {'name': 'plot_dpi', 'dtype': 'int', 'info': 'DPI for the gain plot', 'required': False, 'positional': False, 'check_io': True, 'value': 600},
        {'name': 'subplot_scale', 'dtype': 'int', 'info': 'Scale for the subplots in the gain plot', 'required': False, 'positional': False, 'check_io': True, 'value': 6},
        {'name': 'plot_file', 'dtype': 'str', 'info': 'Filename for gain plot', 'required': False, 'positional': False, 'check_io': True, 'value': 'meerkathi-gai-plot'}
    ]
}

@SpheMakh appealing to you now, since both msutils and Stimela classic are all your babies -- can you spot something I can't?