caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

Destruction of CARACal #1480

Closed Fil8 closed 4 months ago

Fil8 commented 1 year ago

Summary of the workflow with which processes can be distributed and which not (from MFS data reduction strategy)

Screenshot 2023-04-03 at 11 55 03

**inquire about virtualconcat

paoloserra commented 1 year ago

Full description of Fornax processing here .

Also, see Sect. 3 of https://ui.adsabs.harvard.edu/abs/2023arXiv230211895S

Fil8 commented 1 year ago

This is an example on how to run distributed jobs using .sbatch files on ilifu.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --job-name='calflag'
#SBATCH --cpus-per-task=32
#SBATCH --mem=220GB
#SBATCH --output=calflag-%A_%a-out.log
#SBATCH --error=calflag-%A_%a-err.log
#SBATCH --time=72:00:00
#SBATCH --mail-user=<your-email>
#SBATCH --mail-type=END,FAIL
#SBATCH --array=0-3

i=$SLURM_ARRAY_TASK_ID

echo "Submitting SLURM job $i"
echo "Running on node $HOSTNAME"
#source /idia/projects/fornax/pipeline_configs/.bashrc
now=$(date)
echo "Current time is: $now"

mkdir -p input
cp /idia/projects/fornax/MFS-data-reduction/miscellaneous_scripts/brightHIfornax.txt input/.

export SINGULARITY_CACHEDIR=/scratch3/projects/fornax/.singularity_cache
export SINGULARITY_TMPDIR=/scratch3/projects/fornax/.singularity_tmp
export SINGULARITY_LOCALCACHEDIR=/scratch3/projects/fornax/.singularity_localcache

cd output/continuum/
imgFoldLast=$(find . | awk '{print $NF}' | grep txt | cut -sd / -f 2 | sort | tail -n 1)
cd ../..

mkdir -p output_spw$i/caltables
cp -r output/caltables/* output_spw$i/caltables/.

source /idia/projects/fornax/caracal-venv/bin/activate
caracal -c calflag_spw$i.yml -sid /software/astro/caracal/STIMELA_IMAGES_1.7.5/
deactivate

chmod -R g+w .

# Check that the caracal log exists
if [ ! -f output_spw$i/log-caracal.txt ]; then
  echo "CARACal log not found!"
  now=$(date)
  echo "Current time is: $now"
  exit 1
fi

# Check if the pipeline completed or crashed
string=$(tail -1 output_spw$i/log-caracal.txt)
if [[ $string == *"error code 1"* ]]; then
  echo "CARACal crashed :("
  now=$(date)
  echo "Current time is: $now"
  exit 1
else
  echo "CARACal finished successfully! Huzzah!!"
  now=$(date)
  echo "Current time is: $now"
fi

Below there is the calflag_spw$i.yml (0) that is called by the .sbatch file:

schema_version: 1.0.3

general:
  prefix: track1                                                  ### CHECK THIS (track1 OR track2)
  rawdatadir: /idia/projects/fornax/SCI-20180516-PS-01/<id>       ### CHANGE <id>
  output: output_spw0
  backend: singularity

getdata:
  dataid: [<id>_sdp_l0]                                           ### CHANGE <id>
  extension: ms

obsconf:
  refant: auto
  minbase: 200

transform__spw:
  enable: true
  label_in: ''
  label_out: spw0
  field: target
  split_field:
    enable: true
    spw: '0:0~3099'
    otfcal:
      enable: true
      label_cal: 1gc1

flag__sarao:
  enable: true
  field: target
  label_in: spw0
  summary:
    enable: true

flag__spw:
  enable: true
  field: target
  label_in: spw0
  flag_autocorr:
    enable: true
  flag_shadow:
    enable: true
    full_mk64: true
  flag_rfi:
    enable: true
    flagger: aoflagger
    aoflagger:
      strategy: flagtarget_Q.rfis
  inspect:
    enable: true
    time_step: 10

transform__avg:
  enable: true
  label_in: spw0
  label_out: spw0cont
  field: target
  split_field:
    enable: true
    spw: '0:0~2999'
    chan_avg: 150
    col: data

following spw have different channel selection and labels change:

transform__spw:
  enable: true
  label_in: ''
  label_out: spw1
  field: target
  split_field:
    enable: true
    spw: '0:2900~6099'
    otfcal:
      enable: true
      label_cal: 1gc1

...

transform__avg:
  enable: true
  label_in: spw1
  label_out: spw1cont
  field: target
  split_field:
    enable: true
    spw: '0:100~3099'
    chan_avg: 150
    col: data

then cont.yml is called by an un-distributed .sbatch file, where the datasets are concatenated again

schema_version: 1.0.6

general:
  prefix: track1                                                  ### CHECK THIS (track1 OR track2)
  rawdatadir: /idia/projects/fornax/SCI-20180516-PS-01/<id>       ### CHANGE <id>
  backend: singularity

getdata:
  dataid: [<id>_sdp_l0]                                           ### CHANGE <id>
  extension: ms

obsconf:
  refant: auto
  minbase: 200

transform__cont:
  enable: true
  label_in: 'spw0cont,spw1cont,spw2cont,spw3cont'
  label_out: cont
  field: target
  split_field:
    enable: false
  concat:
    enable: true
Fil8 commented 1 year ago

How do we tell CARACal when to distribute and how: