Closed AGI-chandler closed 5 years ago
Seg-fault in clean_msa_working_space()
? That is probably related to running low on memory. That's when we have seen that before.
Well what is clean_msa_working_space () for? The working area is the /working disk which is shared via NFS, does it have something to do with that? I don't see any indication of excessive memory usage. The n008 that these jobs were running on has 1TB of RAM @8GB/thread. Here's the fc_run.cfg from the first run in the original post:
[General]
# list of files of the initial bas.h5 files
input_fofn = input.fofn
#input_fofn = preads.fofn
input_type = raw
#input_type = preads
# (integer) estimated number of base-pairs in haplotype
genome_size = 1000000000
# (integer) requested coverage for auto-calculated cutoff
seed_coverage = 40
# The length cutoff used for seed reads used for initial mapping (pre-assembly stage)
# (integer) minimum length of seed-reads used for pre-assembly stage
# If '-1', then auto-calculate the cutoff based on genome_size and seed_coverage.
length_cutoff = -1
#10000
# (integer) minimum length of seed-reads used after pre-assembly, for the "overlap" stage
length_cutoff_pr = 10000
# (string) grid submission system, or "local" (or for "blocking", see wiki)
# case-insensitive
# Supported types include: "sge", "lsf", "pbs", "torque", "slurm", "local"
job_type = sge
# (string) grid job-queue name
# Can be overridden with section-specific sge_option_*
# on Pac, set value = all.q
job_queue = all.q
#sge option
# Grid job distribution options
# No matter what the job_type, we call these "sge_option_*".
# These are ignored for the "local" job_type.
# Typical values:
sge_option = -pe smp 1 -p 500
sge_option_da = -pe smp 8 -p 500
sge_option_la = -pe smp 16 -p 500
sge_option_pda = -pe smp 8 -p 500
sge_option_pla = -pe smp 16 -p 500
sge_option_cns = -pe smp 8 -p 500
sge_option_fc = -pe smp 8 -p 500
# da: daligner (stage-0)
# la: las-merging (stage-0)
# cns: consensus (stage-0)
# pda: daligner on preads (stage-1)
# pla: las-merging on preads (stage-1)
# fc: falcon (stage-2)
##total 248 CPUs available on Pac, the below jobs max = 248/smp = 31 available. #put large number in Q
da_concurrent_jobs = 60
la_concurrent_jobs = 16
pda_concurrent_jobs = 60
pla_concurrent_jobs = 16
cns_concurrent_jobs = 60
fc_concurrent_jobs = 60
# Passed to `HPC.daligner` during pre-assembly stage.
# We will add `-H` based on "length_cutoff".
pa_HPCdaligner_option = -v -B128 -t16 -e.70 -l1000 -s1000 -T8 -M24
# Passed to `HPC.daligner` during overlap stage.
ovlp_HPCdaligner_option = -v -B128 -t32 -h60 -e.96 -l500 -s1000 -T8 -M24
pa_DBsplit_option = -x500 -s400
ovlp_DBsplit_option = -x500 -s400
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --max_n_read 200 --n_core 8
overlap_filtering_setting = --max_diff 60 --max_cov 130 --min_cov 4 --n_core 8
# (boolean string)
# If "true", then skip `LAcheck` during LAmerge/LAsort.
# (Actually, `LAcheck` is run, but failures are ignored.)
# When *daligner* bugs are finally fixed, this will be unnecessary.
skip_checks = true
# (boolean string) whether to run each job in TMPDIR and copy results back to nfs
# If "true", use TMPDIR. (Actually, `tempfile.tmpdir`. See standard Python docs: https://docs.python.org/2/library/tempfile.html )
# If the value looks like a path, then it is used instead of TMPDIR.
#use_tmpdir = false
The other nodes have less than 8GB/thread so we tell Falcon to use less (-T8 -M24 would be 3GB/thread). This configuration has been working for us.
We just had this happen again with the same dataset on our smaller nodes n005 and n007 which have 6.4 GB/thread, and there is no indication in the Ganglia graphs that there was excessive memory usage. The only thing we changed in the fc_run.cfg was smaller split in the pa_DBsplit_option and ovlp_DBsplit_option (from 400 to 300):
agi@n005:/working/pacbioService/O_schlecteri/falcon1st$ diff fc_run.cfg ../falcon1st-failed/fc_run.cfg
67,68c67,68
< pa_DBsplit_option = -x500 -s300
< ovlp_DBsplit_option = -x500 -s300
---
> pa_DBsplit_option = -x500 -s400
> ovlp_DBsplit_option = -x500 -s400
This is the crashed job:
agi@n005:/working/pacbioService/O_schlecteri/falcon1st$ grep P4734e7cad all.log
'P4734e7cad126a4': Job(jobid='P4734e7cad126a4', cmd='/bin/bash run.sh', rundir=u'/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00160', options={'job_queue': 'all.q', 'sge_option': '-pe smp 8 -p 500', 'job_type': 'sge'}),
2019-02-22 05:52:04,619 - pwatcher.fs_based - DEBUG - Wrapped "python2.7 -m pwatcher.mains.fs_heartbeat --directory=/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00160 --heartbeat-file=/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/heartbeats/heartbeat-P4734e7cad126a4 --exit-file=/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/exits/exit-P4734e7cad126a4 --rate=10.0 /bin/bash run.sh || echo 99 >| /working/pacbioService/O_schlecteri/falcon1st/mypwatcher/exits/exit-P4734e7cad126a4"
2019-02-22 05:52:04,619 - pwatcher.fs_based - DEBUG - Writing wrapper "/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/wrappers/run-P4734e7cad126a4.bash"
2019-02-22 05:52:04,620 - pwatcher.fs_based - INFO - starting job Job(jobid='P4734e7cad126a4', cmd='/bin/bash run.sh', rundir=u'/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00160', options={'job_queue': 'all.q', 'sge_option': '-pe smp 8 -p 500', 'job_type': 'sge'}) w/ job_type=SGE
2019-02-22 05:52:04,620 - pwatcher.fs_based - DEBUG - CD: u'/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/jobs/P4734e7cad126a4' <- '/working/pacbioService/O_schlecteri/falcon1st'
2019-02-22 05:52:04,621 - pwatcher.fs_based - INFO - !qsub -N P4734e7cad126a4 -q all.q -pe smp 8 -p 500 -V -cwd -o stdout -e stderr -S /bin/bash /working/pacbioService/O_schlecteri/falcon1st/mypwatcher/wrappers/run-P4734e7cad126a4.bash
2019-02-22 05:52:04,634 - pwatcher.fs_based - DEBUG - CD: u'/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/jobs/P4734e7cad126a4' -> '/working/pacbioService/O_schlecteri/falcon1st'
2019-02-22 05:52:04,635 - pwatcher.fs_based - INFO - Submitted backgroundjob=MetaJobSge(MetaJob(job=Job(jobid='P4734e7cad126a4', cmd='/bin/bash run.sh', rundir=u'/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00160', options={'job_queue': 'all.q', 'sge_option': '-pe smp 8 -p 500', 'job_type': 'sge'}), lang_exe='/bin/bash'))
2019-02-22 05:52:04,667 - pypeflow.simple_pwatcher_bridge - DEBUG - Result of watcher.run()={'submitted': ['Pbef886db774d4d', 'Pac943595357acd', 'P54fdee6bfb3aac', 'Pcad8c42b7ba156', 'P5e2547ca11653e', 'P30f33901af258d', 'Pf15892543ebe9b', 'Pd3efef4b02bb40', 'P0fdaa380224fe8', 'Pedfc3caacaa3d7', 'P717074f9fc1918', 'P6738ded014ec89', 'P38fb70fd9c4a5e', 'P4734e7cad126a4', 'P66d5d68155ce49', 'P4a5a7fe85fe09b']}
2019-02-22 05:52:04,696 - pwatcher.fs_based - DEBUG - Status RUNNING for heartbeat:heartbeat-P4734e7cad126a4
2019-02-22 05:52:04,904 - pwatcher.fs_based - DEBUG - Status RUNNING for heartbeat:heartbeat-P4734e7cad126a4
2019-02-22 05:52:05,131 - pwatcher.fs_based - DEBUG - Status RUNNING for heartbeat:heartbeat-P4734e7cad126a4
The "Status RUNNING" continues to be printed.
Feb 23 01:32:10 n005 kernel: python[34686]: segfault at 1 ip 00002afe78dd44c3 sp 00007ffd920f5150 error 4 in ext_falcon.so[2afe78dd1000+5000]
The time of the segfault corresponds to just after the red hatch between Fri and Sat in the graph, looks like system was only using about 35/256GB RAM at that time. Here's the backtrace, faulting in clean_msa_working_space() again:
agi@pac:/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00160/uow-00$ gdb /opt/rh/python27/root/usr/bin/python core.34686
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/rh/python27/root/usr/bin/python...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/bin/python2.7.debug...done.
done.
[New Thread 34686]
Reading symbols from /opt/rh/python27/root/usr/lib64/libpython2.7.so.1.0...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/libpython2.7.so.1.0.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/libpython2.7.so.1.0
Reading symbols from /lib64/libpthread.so.0...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
[Thread debugging using libthread_db enabled]
done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...Reading symbols from /usr/lib/debug/lib64/libdl-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libutil.so.1...Reading symbols from /usr/lib/debug/lib64/libutil-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libm.so.6...Reading symbols from /usr/lib/debug/lib64/libm-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/lib64/libc-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.12.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_localemodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_localemodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_localemodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ctypes.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ctypes.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ctypes.so
Reading symbols from /usr/lib64/libffi.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libffi.so.5
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_struct.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_struct.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_struct.so
Reading symbols from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so...done.
Loaded symbols for /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/itertoolsmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/itertoolsmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/itertoolsmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_functoolsmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_functoolsmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_functoolsmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/stropmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/stropmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/stropmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/operator.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/operator.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/operator.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_collectionsmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_collectionsmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_collectionsmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_heapq.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_heapq.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_heapq.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/timemodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/timemodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/timemodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cStringIO.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cStringIO.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cStringIO.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/selectmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/selectmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/selectmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/fcntlmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/fcntlmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/fcntlmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/binascii.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/binascii.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/binascii.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/math.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/math.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/math.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_socketmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_socketmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_socketmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ssl.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ssl.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ssl.so
Reading symbols from /usr/lib64/libssl.so.10...Reading symbols from /usr/lib/debug/usr/lib64/libssl.so.1.0.1e.debug...done.
done.
Loaded symbols for /usr/lib64/libssl.so.10
Reading symbols from /usr/lib64/libcrypto.so.10...Reading symbols from /usr/lib/debug/usr/lib64/libcrypto.so.1.0.1e.debug...done.
done.
Loaded symbols for /usr/lib64/libcrypto.so.10
Reading symbols from /lib64/libgssapi_krb5.so.2...Reading symbols from /usr/lib/debug/lib64/libgssapi_krb5.so.2.2.debug...done.
done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3...Reading symbols from /usr/lib/debug/lib64/libkrb5.so.3.3.debug...done.
done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...Reading symbols from /usr/lib/debug/lib64/libcom_err.so.2.1.debug...done.
done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3...Reading symbols from /usr/lib/debug/lib64/libk5crypto.so.3.1.debug...done.
done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/libz.so.1
Reading symbols from /lib64/libkrb5support.so.0...Reading symbols from /usr/lib/debug/lib64/libkrb5support.so.0.1.debug...done.
done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...Reading symbols from /usr/lib/debug/lib64/libkeyutils.so.1.3.debug...done.
done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2...Reading symbols from /usr/lib/debug/lib64/libresolv-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1...Reading symbols from /usr/lib/debug/lib64/libselinux.so.1.debug...done.
done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_io.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_io.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_io.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/future_builtins.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/future_builtins.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/future_builtins.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_multiprocessing.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_multiprocessing.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_multiprocessing.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cPickle.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cPickle.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cPickle.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/arraymodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/arraymodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/arraymodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_hashlib.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_hashlib.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_hashlib.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_randommodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_randommodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_randommodule.so
Core was generated by `python -m falcon_kit.mains.consensus --output_multi --min_idt 0.70 --min_cov 4'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002afe78dd44c3 in clean_msa_working_space () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
Missing separate debuginfos, use: debuginfo-install libffi-3.0.5-3.2.el6.x86_64
(gdb) bt
#0 0x00002afe78dd44c3 in clean_msa_working_space () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
#1 0x00002afe78dd4a9c in get_cns_from_align_tags () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
#2 0x00002afe78dd4da4 in generate_consensus () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
#3 0x00002afe789c5dac in ffi_call_unix64 () from /usr/lib64/libffi.so.5
#4 0x00002afe789c5b34 in ffi_call () from /usr/lib64/libffi.so.5
#5 0x00002afe7879841c in _call_function_pointer (pProc=0x2afe78dd4b60 <generate_consensus>, argtuple=47272431092912, flags=4353,
argtypes=<value optimized out>, restype=<_ctypes.PyCPointerType at remote 0x16e7830>, checker=0x0)
at /usr/src/debug/Python-2.7.13/Modules/_ctypes/callproc.c:841
#6 _ctypes_callproc (pProc=0x2afe78dd4b60 <generate_consensus>, argtuple=47272431092912, flags=4353, argtypes=<value optimized out>, restype=
<_ctypes.PyCPointerType at remote 0x16e7830>, checker=0x0) at /usr/src/debug/Python-2.7.13/Modules/_ctypes/callproc.c:1184
#7 0x00002afe78792202 in PyCFuncPtr_call (self=<value optimized out>, inargs=<value optimized out>, kwds=0x0)
at /usr/src/debug/Python-2.7.13/Modules/_ctypes/_ctypes.c:3979
#8 0x00002afe720f7d13 in PyObject_Call (func=<_FuncPtr(__name__='generate_consensus') at remote 0x2afe78774050>, arg=<value optimized out>,
kw=<value optimized out>) at /usr/src/debug/Python-2.7.13/Objects/abstract.c:2547
#9 0x00002afe72196b04 in do_call (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4646
#10 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4451
#11 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3063
#12 0x00002afe72199e0e in PyEval_EvalCodeEx (co=0x2afe726a7930, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>,
argcount=1, kws=0x2afe724ac068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3661
#13 0x00002afe72120dd8 in function_call (func=<function at remote 0x2afe7b9ce320>, arg=
((['CCCCATCTTAATCTCTTCTATTTGTATATTTCATTCTGGTTGATTACAGATGGTGCCTTTCTTGAGTGATCGACCAGATTGCTCCTGCTCAGTTAAATTGAATGGCAATGGCCCCCAATACATTCCATTCAGGAAAGTGAAATTGCAATCAAGTTTCTGAAGATTGGGAAAGAACTAAAATTTGTCCTTGGTGAAATTGCTGGTCTGATATTTGTGCAAGAACAAGAAACTGATATAGGTAAGTGGTATTTTACATCAATTTAATACATATTATGTTTCTAATCCATATTGCATTGCTGCATTGTCCTGGTGTCTGGCAAATTATACTTGAGCAGATCTGCAAACATGTAAAGAAACAAATTGGACATTGCTTATAATGTCTGAAAGTGCATAATATTGCAATCAAGTTCTTTACATAGGACCTCATTAATTCCTGATAATTGTTATCACTTTGAGGTTTTATAATTTTTATGACTTAATTGTTTAATTCTCTGTTGAACTGGTTTCATTTTTTGGAGGGTTTAGATTGCTCTGTGAACTGTTAACAGTAGTGATTCTTGTTACCTGCTAGTTCACTCAACAATAGTATGTTTTGTTGCATAGACCTTACTTCCTTTATCTTCCGATTGATTTGTAATCATATGGTACTTTATTCCCCAGCTTTTCTTCTGTGGCAATTGATGAGGTACAATACCTTCAATTGTTAAAGATCGAGATTCCATTCATGGAAAGAGAATCACAGATATACAATACGTGATTTGGCAAACAGAAAGTAAAGGATTCCTGGAAGCATTCTCAAATGGAGAGTTTCAAGGTTGGTTTGCCTGATACATTTCCTATTGATAATATCCATATTACTGTTCTTTTTTGGTTAGTTAGCCGCAAAGTTTCTTCAATGTCTGGTCTTTGTATAATTTGTTGATAATGAAATATGCAATCCTGAATTTCTATCAAACTATGATCTTAGGTGGTGGCAAACTTTTGGGACTAGCTTCAATGC', 'CCCCATCTTAATCTCT...(truncated), kw={}) at /usr/src/debug/Python-2.7.13/Objects/funcobject.c:523
#14 0x00002afe720f7d13 in PyObject_Call (func=<function at remote 0x2afe7b9ce320>, arg=<value optimized out>, kw=<value optimized out>)
at /usr/src/debug/Python-2.7.13/Objects/abstract.c:2547
#15 0x00002afe72196501 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4743
#16 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3102
#17 0x00002afe72199e0e in PyEval_EvalCodeEx (co=0x2afe7b9a7230, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>,
argcount=5, kws=0x2afe724ac068, kwcount=0, defs=0x2afe7b9cbd38, defcount=3, closure=0x0) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3661
#18 0x00002afe72120dd8 in function_call (func=<function at remote 0x2afe7b9f0050>, arg=
(<SimpleQueue(_writer=<_multiprocessing.Connection at remote 0x18645e0>, get=<function at remote 0x2afe7bc42d70>, _reader=<_multiprocessing.Connection at remote 0x18641b0>, put=<function at remote 0x2afe7bc42de8>, _wlock=<Lock(release=<built-in method release of _multiprocessing.SemLock object at remote 0x2afe7ba071b0>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x2afe7ba071b0>, _semlock=<_multiprocessing.SemLock at remote 0x2afe7ba071b0>) at remote 0x2afe7ba0d390>, _rlock=<Lock(release=<built-in method release of _multiprocessing.SemLock object at remote 0x2afe7ba07180>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x2afe7ba07180>, _semlock=<_multiprocessing.SemLock at remote 0x2afe7ba07180>) at remote 0x2afe7ba0d350>) at remote 0x2afe7b9f7910>, <SimpleQueue(_writer=<_multiprocessing.Connection at remote 0x185f460>, get=<function at remote 0x2afe7bc42f50>, _reader=<_multiprocessing.Connection at remote 0x1864a10>, put=<function at remote 0x2...(truncated), kw={}) at /usr/src/debug/Python-2.7.13/Objects/funcobject.c:523
#19 0x00002afe720f7d13 in PyObject_Call (func=<function at remote 0x2afe7b9f0050>, arg=<value optimized out>, kw=<value optimized out>)
at /usr/src/debug/Python-2.7.13/Objects/abstract.c:2547
#20 0x00002afe72196501 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4743
#21 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3102
#22 0x00002afe72197fd3 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4514
#23 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4449
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb)
And here's the same info for the other crashed job:
'Pef81991a561042': Job(jobid='Pef81991a561042', cmd='/bin/bash run.sh', rundir=u'/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00001', options={'job_queue': 'all.q', 'sge_option': '-pe smp 8 -p 500', 'job_type': 'sge'}),
2019-02-22 05:52:01,934 - pwatcher.fs_based - DEBUG - Wrapped "python2.7 -m pwatcher.mains.fs_heartbeat --directory=/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00001 --heartbeat-file=/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/heartbeats/heartbeat-Pef81991a561042 --exit-file=/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/exits/exit-Pef81991a561042 --rate=10.0 /bin/bash run.sh || echo 99 >| /working/pacbioService/O_schlecteri/falcon1st/mypwatcher/exits/exit-Pef81991a561042"
2019-02-22 05:52:01,934 - pwatcher.fs_based - DEBUG - Writing wrapper "/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/wrappers/run-Pef81991a561042.bash"
2019-02-22 05:52:01,935 - pwatcher.fs_based - INFO - starting job Job(jobid='Pef81991a561042', cmd='/bin/bash run.sh', rundir=u'/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00001', options={'job_queue': 'all.q', 'sge_option': '-pe smp 8 -p 500', 'job_type': 'sge'}) w/ job_type=SGE
2019-02-22 05:52:01,935 - pwatcher.fs_based - DEBUG - CD: u'/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/jobs/Pef81991a561042' <- '/working/pacbioService/O_schlecteri/falcon1st'
2019-02-22 05:52:01,936 - pwatcher.fs_based - INFO - !qsub -N Pef81991a561042 -q all.q -pe smp 8 -p 500 -V -cwd -o stdout -e stderr -S /bin/bash /working/pacbioService/O_schlecteri/falcon1st/mypwatcher/wrappers/run-Pef81991a561042.bash
2019-02-22 05:52:01,979 - pwatcher.fs_based - DEBUG - CD: u'/working/pacbioService/O_schlecteri/falcon1st/mypwatcher/jobs/Pef81991a561042' -> '/working/pacbioService/O_schlecteri/falcon1st'
2019-02-22 05:52:01,980 - pwatcher.fs_based - INFO - Submitted backgroundjob=MetaJobSge(MetaJob(job=Job(jobid='Pef81991a561042', cmd='/bin/bash run.sh', rundir=u'/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00001', options={'job_queue': 'all.q', 'sge_option': '-pe smp 8 -p 500', 'job_type': 'sge'}), lang_exe='/bin/bash'))
2019-02-22 05:52:02,028 - pypeflow.simple_pwatcher_bridge - DEBUG - Result of watcher.run()={'submitted': ['P3ce42c6ff0a731', 'P4cdf6fce85b10e', 'P7cb6535a7a3512', 'Pc3ef2b3b10c37f', 'P4b010bfc1f5832', 'P62cc03a2774bf8', 'P0c43ca3be10116', 'P4df5d4895cfb19', 'P833260d44c27f1', 'P33e29bf1bf26c6', 'Pe7a93ab9113e5b', 'Pfd27cad9d93ad7', 'P2e6b4eafc128c0', 'P96456f479801c2', 'Pe149b6844ae286', 'P1e5f82a8823e73', 'P17c8d47844d7bb', 'P22793815175372', 'P50397dd3a82692', 'Pa71e75e3612e3e', 'P50ae3ae91cd2da', 'Pffed1a9b405bd8', 'P9455d63e76b0ed', 'P92d99c88510341', 'P2254d01ccaa5b5', 'P7404d08c289d92', 'Pc18c86e2fad2ff', 'Pced43a6eedea70', 'P2822d7033abe0c', 'Pb240e78c27c380', 'Pef81991a561042', 'Pa379e00c204178', 'Pa456dca0d7cb87', 'P4e1b99ced84a3b']}
2019-02-22 05:52:02,098 - pwatcher.fs_based - DEBUG - Status RUNNING for heartbeat:heartbeat-Pef81991a561042
2019-02-22 05:52:02,574 - pwatcher.fs_based - DEBUG - Status RUNNING for heartbeat:heartbeat-Pef81991a561042
2019-02-22 05:52:02,986 - pwatcher.fs_based - DEBUG - Status RUNNING for heartbeat:heartbeat-Pef81991a561042
Feb 22 23:50:53 n007 kernel: python[30094]: segfault at 1 ip 00002ac40c6d24c3 sp 00007ffc9730d0d0 error 4 in ext_falcon.so[2ac40c6cf000+5000]
Just before the red hatch between Fri and Sat in the graph, still doesn't appear to be excessive at all.
And here's the backtrace for that process:
agi@pac:/working/pacbioService/O_schlecteri/falcon1st/0-rawreads/cns-runs/cns_00001/uow-00$ gdb /opt/rh/python27/root/usr/bin/python core.30094
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-92.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/rh/python27/root/usr/bin/python...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/bin/python2.7.debug...done.
done.
[New Thread 30094]
Reading symbols from /opt/rh/python27/root/usr/lib64/libpython2.7.so.1.0...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/libpython2.7.so.1.0.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/libpython2.7.so.1.0
Reading symbols from /lib64/libpthread.so.0...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
[Thread debugging using libthread_db enabled]
done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...Reading symbols from /usr/lib/debug/lib64/libdl-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libutil.so.1...Reading symbols from /usr/lib/debug/lib64/libutil-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libm.so.6...Reading symbols from /usr/lib/debug/lib64/libm-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/lib64/libc-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.12.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_localemodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_localemodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_localemodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ctypes.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ctypes.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ctypes.so
Reading symbols from /usr/lib64/libffi.so.5...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libffi.so.5
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_struct.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_struct.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_struct.so
Reading symbols from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so...done.
Loaded symbols for /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/itertoolsmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/itertoolsmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/itertoolsmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_functoolsmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_functoolsmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_functoolsmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/stropmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/stropmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/stropmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/operator.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/operator.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/operator.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_collectionsmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_collectionsmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_collectionsmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_heapq.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_heapq.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_heapq.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/timemodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/timemodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/timemodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cStringIO.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cStringIO.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cStringIO.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/selectmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/selectmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/selectmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/fcntlmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/fcntlmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/fcntlmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/binascii.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/binascii.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/binascii.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/math.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/math.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/math.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_socketmodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_socketmodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_socketmodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ssl.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ssl.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_ssl.so
Reading symbols from /usr/lib64/libssl.so.10...Reading symbols from /usr/lib/debug/usr/lib64/libssl.so.1.0.1e.debug...done.
done.
Loaded symbols for /usr/lib64/libssl.so.10
Reading symbols from /usr/lib64/libcrypto.so.10...Reading symbols from /usr/lib/debug/usr/lib64/libcrypto.so.1.0.1e.debug...done.
done.
Loaded symbols for /usr/lib64/libcrypto.so.10
Reading symbols from /lib64/libgssapi_krb5.so.2...Reading symbols from /usr/lib/debug/lib64/libgssapi_krb5.so.2.2.debug...done.
done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libkrb5.so.3...Reading symbols from /usr/lib/debug/lib64/libkrb5.so.3.3.debug...done.
done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...Reading symbols from /usr/lib/debug/lib64/libcom_err.so.2.1.debug...done.
done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3...Reading symbols from /usr/lib/debug/lib64/libk5crypto.so.3.1.debug...done.
done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/libz.so.1
Reading symbols from /lib64/libkrb5support.so.0...Reading symbols from /usr/lib/debug/lib64/libkrb5support.so.0.1.debug...done.
done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...Reading symbols from /usr/lib/debug/lib64/libkeyutils.so.1.3.debug...done.
done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libresolv.so.2...Reading symbols from /usr/lib/debug/lib64/libresolv-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1...Reading symbols from /usr/lib/debug/lib64/libselinux.so.1.debug...done.
done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_io.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_io.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_io.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/future_builtins.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/future_builtins.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/future_builtins.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_multiprocessing.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_multiprocessing.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_multiprocessing.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cPickle.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cPickle.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/cPickle.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/arraymodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/arraymodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/arraymodule.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_hashlib.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_hashlib.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_hashlib.so
Reading symbols from /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_randommodule.so...Reading symbols from /usr/lib/debug/opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_randommodule.so.debug...done.
done.
Loaded symbols for /opt/rh/python27/root/usr/lib64/python2.7/lib-dynload/_randommodule.so
Core was generated by `python -m falcon_kit.mains.consensus --output_multi --min_idt 0.70 --min_cov 4'.
Program terminated with signal 11, Segmentation fault.
#0 0x00002ac40c6d24c3 in clean_msa_working_space () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
Missing separate debuginfos, use: debuginfo-install libffi-3.0.5-3.2.el6.x86_64
(gdb) bt
#0 0x00002ac40c6d24c3 in clean_msa_working_space () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
#1 0x00002ac40c6d2a9c in get_cns_from_align_tags () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
#2 0x00002ac40c6d2da4 in generate_consensus () from /opt/pacbio/falcon-2018.03.12-04.00-py2.7-ucs4/lib/python2.7/site-packages/ext_falcon.so
#3 0x00002ac40c2c3dac in ffi_call_unix64 () from /usr/lib64/libffi.so.5
#4 0x00002ac40c2c3b34 in ffi_call () from /usr/lib64/libffi.so.5
#5 0x00002ac40c09641c in _call_function_pointer (pProc=0x2ac40c6d2b60 <generate_consensus>, argtuple=< at remote 0x7ffc9730d4e0>, flags=4353,
argtypes=<value optimized out>, restype=<_ctypes.PyCPointerType at remote 0xf99830>, checker=0x0)
at /usr/src/debug/Python-2.7.13/Modules/_ctypes/callproc.c:841
#6 _ctypes_callproc (pProc=0x2ac40c6d2b60 <generate_consensus>, argtuple=< at remote 0x7ffc9730d4e0>, flags=4353, argtypes=<value optimized out>, restype=
<_ctypes.PyCPointerType at remote 0xf99830>, checker=0x0) at /usr/src/debug/Python-2.7.13/Modules/_ctypes/callproc.c:1184
#7 0x00002ac40c090202 in PyCFuncPtr_call (self=<value optimized out>, inargs=<value optimized out>, kwds=0x0)
at /usr/src/debug/Python-2.7.13/Modules/_ctypes/_ctypes.c:3979
#8 0x00002ac4059f5d13 in PyObject_Call (func=<_FuncPtr(__name__='generate_consensus') at remote 0x2ac40c072050>, arg=<value optimized out>,
kw=<value optimized out>) at /usr/src/debug/Python-2.7.13/Objects/abstract.c:2547
#9 0x00002ac405a94b04 in do_call (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4646
#10 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4451
#11 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3063
#12 0x00002ac405a97e0e in PyEval_EvalCodeEx (co=0x2ac405fa5930, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>,
argcount=1, kws=0x2ac405daa068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3661
#13 0x00002ac405a1edd8 in function_call (func=<function at remote 0x2ac40f2cc320>, arg=
((['AGGTCAATACTGAAGCAGGTCCTAAGCAATAAATGAAGGAGTTAGTTTTAGAACAAGCTATAATTACTCTTTTTTACAATTTGATCATATACCACTCACTGTACTATGTGTTATAGGCCAAAACCTCAAATACGAACTAATGCTCCAAAGATAACAATGGCAACCAGTCCTACAAAGAGACAGCTCCAAGGGTAAATAAAGCACCACTCACTAACTCTACTCCTCCTACGAGAATCAGCTCTTGTAGTGCTAGTTCTAAGTCCTTTGACAAGAAGTAGGAAAGTTAGTGGAAGCTAAACATTTACGAAGCAACCATGAAGTTGTTATCTTTGTTAGTTGCATGCTTTTGTTGCTAGACCGTAAACCACTGATCATGGTTTTATTCATTACTATCGTAATGTTTATTTTGCTGCTCTTAAAAACCTGGATGTGAACTATTGTGTAAGAAGTATGTCTGCCTATGTTGGGAGAGTATTCAACTCTGCACCGTTGTTAACTAATGTTATGCACTTCCCACCTAGTTGAATTTTTTTTTGTAAAAGCTGCAATTCTGAATGTTCTAGTAGACAATGCATTCACTTTACTGAAATAGTATTTCACATACAACTAACAAGAAAATGCACATATTACAACTTCCAAATTATCTACAACTGTAAGGAACACAACTAAAACATAAACTATGTTGATATTGAAGTCCATCATATCAAATCGCCTTCCTATTCCCTGCAAACTTCAAATCATTTCTTCCATTCGTTGTTGCGGCCGTTTTGTTTCTAACAAAAGCCTCATTTCATCTTCAAAACCAAAACTGATACCAGATTGGCTCGTCCTACGACAAAGAAAAATCATCACATTGGGACTGATGCAACTGCCAAAAAATAAAACTTTGTAAGTGCTAATCATTCAACTCCACCGACGAACTGGGAACTTGTAGTAGTGCCGGCCTAGTTCTTGATGTTCTGATGTCTTCCGCCAAACCATCACTTGCAGTTGAGGCACC', 'AGGTCAATACTGAAGC...(truncated), kw={}) at /usr/src/debug/Python-2.7.13/Objects/funcobject.c:523
#14 0x00002ac4059f5d13 in PyObject_Call (func=<function at remote 0x2ac40f2cc320>, arg=<value optimized out>, kw=<value optimized out>)
at /usr/src/debug/Python-2.7.13/Objects/abstract.c:2547
#15 0x00002ac405a94501 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4743
#16 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3102
#17 0x00002ac405a97e0e in PyEval_EvalCodeEx (co=0x2ac40f2a5230, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>,
argcount=5, kws=0x2ac405daa068, kwcount=0, defs=0x2ac40f2c9d38, defcount=3, closure=0x0) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3661
#18 0x00002ac405a1edd8 in function_call (func=<function at remote 0x2ac40f2ee050>, arg=
(<SimpleQueue(_writer=<_multiprocessing.Connection at remote 0x11165e0>, get=<function at remote 0x2ac40f540d70>, _reader=<_multiprocessing.Connection at remote 0x11161b0>, put=<function at remote 0x2ac40f540de8>, _wlock=<Lock(release=<built-in method release of _multiprocessing.SemLock object at remote 0x2ac40f3051b0>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x2ac40f3051b0>, _semlock=<_multiprocessing.SemLock at remote 0x2ac40f3051b0>) at remote 0x2ac40f30b390>, _rlock=<Lock(release=<built-in method release of _multiprocessing.SemLock object at remote 0x2ac40f305180>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x2ac40f305180>, _semlock=<_multiprocessing.SemLock at remote 0x2ac40f305180>) at remote 0x2ac40f30b350>) at remote 0x2ac40f2f5910>, <SimpleQueue(_writer=<_multiprocessing.Connection at remote 0x1111460>, get=<function at remote 0x2ac40f540f50>, _reader=<_multiprocessing.Connection at remote 0x1116a10>, put=<function at remote 0x2...(truncated), kw={}) at /usr/src/debug/Python-2.7.13/Objects/funcobject.c:523
#19 0x00002ac4059f5d13 in PyObject_Call (func=<function at remote 0x2ac40f2ee050>, arg=<value optimized out>, kw=<value optimized out>)
at /usr/src/debug/Python-2.7.13/Objects/abstract.c:2547
#20 0x00002ac405a94501 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4743
#21 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:3102
#22 0x00002ac405a95fd3 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4514
#23 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7.13/Python/ceval.c:4449
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb)
Here's the memory usage on the NFS server, doesn't look excessive either:
Although these graphs might not have the resolution needed to show a sudden spike in the memory. Any other ideas or modifications to the config we can try?
Thanks
This is disturbing. And debugging a crash in a Python shared-library is very difficult remotely.
If you have some sway with PB management, then you could escalate this through other channels. Then if you provide a test-case, we could try to repro locally and then debug.
Otherwise, you'll have to do most of the work yourself. You need to build the Python shared-library with debug symbols. (Google that. It's not trivial.) Then get a stack-trace from the C crash. Then repos (and link this Issue) in our pbbioconda repo, since we generally ignore the Issues in the old repos.
Also, you might be better off using the latest builds from Bioconda's pb-assembly
recipe. We definitely cannot spend time debugging the older "tarball" release, and the source-code publicly available in this repo is way out of date.
We are using the latest stable binary release, falcon-2018.03.12-04.00-py2.7-ucs4, on CentOS 6.9 and having a segfault with a couple jobs:
These jobs are still stuck in the grid queue, if we lookup the "P" number for the jobs in the all.log we are led to 0-rawreads/cns-runs/cns_00120 and 0-rawreads/cns-runs/cns_00001 directories. There's a core dump in each of the directories. Here is some output from gdb with a backtrace: