PaulHancock / MWA-fast-image-transients

Code and notes for the data reduction and analysis of fast image transients observed with the MWA
Other
5 stars 3 forks source link

obs_dl.sh chain still breaking #25

Closed gemmaanderson closed 6 years ago

gemmaanderson commented 6 years ago

I ran obs_dl on two calibrator datasets today.

The first dataset below crashed before progressing from downloading to cotter:

>./bin/obs_dl.sh -c 1201153248 -n 3C444 1201153248
Submitted /astro/mwasci/phancock/D0009/queue/dl_1201153248.sh as 2570380

The dl script errors are:

> less dl_1201153248.o2570380
Expected 48 files but found 50, assuming something bad happened.
> less dl_1201153248.e2570380
ls: cannot access '1201153248*.mwaf': No such file or directory
slurmstepd: error: Exceeded step memory limit at some point.
slurmstepd: error: Exceeded job memory limit at some point.

The second dataset crashed after successfully running dl and cotter, but before it ran calibrate:

> ./bin/obs_dl.sh -c 1212343352 -n HerA 1212343352
Submitted /astro/mwasci/phancock/D0009/queue/dl_1212343352.sh as 2570394

The above gave the following dl errors but this did not seem to stop it moving onto cotter

> less dl_1212343352.o2570394
Don't know how many files to expect, so assuming everything is ok.
> less dl_1212343352.e2570394
/var/spool/slurm/job2570394/slurm_script: line 74: syntax error near unexpected token `fi'
/var/spool/slurm/job2570394/slurm_script: line 74: `fi'
slurmstepd: error: Exceeded step memory limit at some point.
slurmstepd: error: Exceeded job memory limit at some point.

A measurement set was created but calibrate was not run. Cotter gave the following errors:

> less cotter_1212343352.e4655494
+ python bin/track_task.py finish --jobid=4655494 --finish_time=1533025059
slurmstepd: error: Exceeded step memory limit at some point.
gemmaanderson commented 6 years ago

Obs download to end 1 Dec. No longer relevant