caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

Sunblocker running out of memory #1467

Open spectram opened 1 year ago

spectram commented 1 year ago

Sunblocker exits with error code 137 when run on a 50 MHz subband. Please find the log attached.

log-caracal.txt

Athanaseus commented 5 months ago

Traceback:

Timeout set to -1. The container ID is printed below.
# running cd /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 && singularity run --workdir /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 --containall  --bind /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758/stimela_parameter_files/sunblocker_ms0-1406311220628961676445369146244.json:/stimela_mount/configfile:ro --bind /scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/stimela/cargo/cab/sunblocker/src:/stimela_mount/code:ro --bind /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758/passwd:/etc/passwd:rw --bind /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758/group:/etc/group:rw --bind /scratch3/projects/meerchoirs/sriram/caracal_env/bin/stimela_runscript:/singularity:ro --bind /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/msdir:/stimela_mount/msdir:rw --bind /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/input:/stimela_mount/input:ro --bind /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1:/stimela_mount/output:rw --bind /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/tmp:/stimela_mount/output/tmp:rw /software/astro/caracal/STIMELA_IMAGES_1.7.5/stimela_sunblocker_1.0.2.sif /singularity
# WARNING: Overriding HOME environment variable with SINGULARITYENV_HOME is not permitted
# Killed
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR: cd /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 && singularity run --workdir /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 --containall returns error code 137
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR: job failed at 2023-02-15 09:41:22.863505 after 0:25:13.590414
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR: Traceback (most recent call last):
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/stimela/recipe.py", line 713, in run
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:     job.run_job()
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/stimela/recipe.py", line 425, in run_job
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:     self.job.run(output_wrangler=self.apply_output_wranglers)
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/stimela/singularity.py", line 123, in run
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:     utils.xrun(f"cd {self.execdir} && singularity run --workdir {self.execdir} --containall",
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/stimela/utils/xrun_poll.py", line 227, in xrun
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR:     raise StimelaCabRuntimeError("{} returns error code {}".format(command_name, status))
2023-02-15 09:41:22 CARACal.Stimela.sunblocker-ms0 ERROR: stimela.utils.StimelaCabRuntimeError: cd /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 && singularity run --workdir /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 --containall returns error code 137
2023-02-15 09:41:22 CARACal.Stimela.line__galx__2 INFO: Completed jobs : ['save-P1_line__galx__2_before-ms0']
2023-02-15 09:41:22 CARACal.Stimela.line__galx__2 INFO: Remaining jobs : ['save-P1_line__galx__2_after-mst0']
2023-02-15 09:41:22 CARACal.Stimela.line__galx__2 INFO: Logging remaining task: save-P1_line__galx__2_after-mst0:: Save flag version
2023-02-15 09:41:22 CARACal.Stimela.line__galx__2 INFO: Saving pipeline information in .last_line__galx__2.json
2023-02-15 09:41:22 CARACal ERROR: Job 'sunblocker-ms0:: Block out sun' failed: cd /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 && singularity run --workdir /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 --containall returns error code 137 [PipelineException]
2023-02-15 09:41:22 CARACal INFO:   More information can be found in the logfile at /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/logs-20230215-091505/log-caracal.txt
2023-02-15 09:41:22 CARACal INFO:   You are running version 1.0.6
2023-02-15 09:41:23 CARACal ERROR: Traceback (most recent call last):
2023-02-15 09:41:23 CARACal ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/caracal/main.py", line 189, in __run
2023-02-15 09:41:23 CARACal ERROR:     pipeline.run_workers()
2023-02-15 09:41:23 CARACal ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/caracal/workers/worker_administrator.py", line 441, in run_workers
2023-02-15 09:41:23 CARACal ERROR:     worker.worker(self, recipe, config)
2023-02-15 09:41:23 CARACal ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/caracal/workers/line_worker.py", line 742, in worker
2023-02-15 09:41:23 CARACal ERROR:     recipe.run()
2023-02-15 09:41:23 CARACal ERROR:   File "/scratch3/projects/meerchoirs/sriram/caracal_env/lib/python3.8/site-packages/stimela/recipe.py", line 764, in run
2023-02-15 09:41:23 CARACal ERROR:     raise PipelineException(exc, self.completed, job, self.remaining) from None
2023-02-15 09:41:23 CARACal ERROR: stimela.exceptions.PipelineException: Job 'sunblocker-ms0:: Block out sun' failed: cd /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 && singularity run --workdir /scratch3/projects/meerchoirs/sriram/cartwheel/meerkat/cw_p1/.stimela_workdir-1676445308981758 --containall returns error code 137
2023-02-15 09:41:23 CARACal INFO: exiting with error code 1
Athanaseus commented 5 months ago

Is this still happening with the latest release?