NCAR / CESM_postprocessing

Project repository for the CESM python based post-processing code, documentation and issues tracking.
76 stars 45 forks source link

batch jobs do not abort when error occurs #181

Open lvankampenhout opened 5 years ago

lvankampenhout commented 5 years ago

Problem: whenever an error occurs somewhere down in the Python code, the batch job hangs and does not abort. When I login onto the compute note I see that there is 100% CPU usage. Not sure if this is a feature of my local cluster (I ported the scripts to SLURM cluster Cartesius) or the postprocessing scripts themselves. However it is clearly sub-optimal because the jobs need to be manually aborted.

bertinia commented 5 years ago

@lvankampenhout - is there a particular postprocessing task where the job doesn't abort correctly?

lvankampenhout commented 5 years ago

Hi Alice, I encountered this issue with both the lnd_averages and timeseries tasks.

lvankampenhout commented 5 years ago

Strangely enough, my jobs do abort today.