NDCMS / lobster

A userspace workflow management tool for harnessing non-dedicated resources for high-throughput workloads.
MIT License
3 stars 14 forks source link

cctools-150 segfaults with lobster worker node wrapper? #625

Closed khurtado closed 5 years ago

khurtado commented 6 years ago

Hi, I ran an LHE MC test with lobster using geoff-crash branch (which I think is now the current master) and cctools-lobster-150. but the my jobs were crashing at the WN when executing the lobster wrapper (see below). This happened when using factory + runos on opportunistic resources, so I tried with the T3, no singularity... no parrot, and still got the same segfault.

If I just move back to cctools-lobster-148 (when launching both, lobster master and workers), my tasks successfully finish though.

Has anyone else tested the new cctools version with lobster already? If so, are you seeing the same issue? I just want to confirm before reporting it to the cctools git issues.

==== working directory before execution @ Sat Feb 24 00:58:32 EST 2018 ====
== dir: total 892
== dir: drwxr-xr-x 17 khurtado campus   4096 Feb 24 00:58 CMSSW_7_1_16_patch1
== dir: -rw-r--r--  2 khurtado campus   3597 Feb 24 00:57 HIG-RunIIWinter15wmLHE-00196_1_cfg.py
== dir: drwxr-xr-x  2 khurtado campus   4096 Feb 24 00:57 bin
== dir: -rwxr-xr-x  2 khurtado campus 644940 Feb 24 00:57 cctools-monitor
== dir: -rw-r--r--  1 khurtado campus      0 Feb 24 00:57 cctools-monitor.summary
== dir: drwxrwxrwx  2 khurtado campus   4096 Feb 24 00:57 cctools-temp-t.1.4ReT4Z
== dir: drwxr-xr-x  2 khurtado campus   4096 Feb 24 00:57 lib
== dir: -rwxrwxrwx  1 khurtado campus 149854 Feb 24 00:57 librmonitor_helper.so.Vt0QD0
<snip>
== dir: -rwxr-xr-x  2 khurtado campus  42370 Feb 24 00:57 task.py
== dir: -rwxr-xr-x  2 khurtado campus   6410 Feb 24 00:57 wrapper.sh
wrapper.sh: line 171: 24003 Segmentation fault      $*
==== working directory after execution @ Sat Feb 24 00:58:43 EST 2018 ====
<snip>
=== wrapper done @ Sat Feb 24 00:58:53 EST 2018
=== final return status = 139 @ Sat Feb 24 00:58:53 EST 2018
klannon commented 5 years ago

We've moved well past cctools-150 so I'm closing this.