cschin / Peregrine

Peregrine: Fast Genome Assembler Using SHIMMER Index
Other
99 stars 9 forks source link

pypeflow Error Unable to run the assembly #23

Open sum732 opened 4 years ago

sum732 commented 4 years ago

Hello, I am trying to run the Peregrine on CCS reads and keep getting following error message. Not sure what is the issue.

Here is the command that is used, the fastq were generated after CCS from bam2fastx (PacBio)

echo 'yes' | pg_run.py asm /data/AnalysisWithsubReads/analysis/Peregrine/read.list 10 10 10 10 10 10 10 10 10 --with-consensus --shimmer-r 3 --best_n_ovlp 8 --output $(pwd)/lbc0001--lbc0001 

Message shown in STDOUT

INFO:pypeflow.simple_pwatcher_bridge:In simple_pwatcher_bridge, pwatcher_impl=<module 'pwatcher.blocking' from '/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa47
6-py3.7.egg/pwatcher/blocking.py'>                                                                                                                                                                     
INFO:pypeflow.simple_pwatcher_bridge:job_type='local', (default)job_defaults={'njobs': 1, 'NPROC': 1, 'MB': 24000, 'submit': 'bash -C ${CMD} >| ${STDOUT_FILE} 2>| ${STDERR_FILE}', 'job_type': 'local'
, 'pwatcher_type': 'blocking'}, use_tmpdir=None, squash=False, job_name_style=0                                                                                                                        
INFO:pypeflow.simple_pwatcher_bridge:Setting max_jobs to 1; was None
INFO:pypeflow.simple_pwatcher_bridge:Num unsatisfied: 1, graph: 1
INFO:pypeflow.simple_pwatcher_bridge:About to submit: Node(lbc0001--lbc0001/0-seqdb)
INFO:pwatcher.blocking:Popen: '/bin/bash -C /lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pwatcher/mains/job_start.sh >| /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/run-Pfbd628d10ca7b2.bash.stdout 2>| /data/AnalysisWithsubReads/analys
is/Peregrine/lbc0001--lbc0001/0-seqdb/run-Pfbd628d10ca7b2.bash.stderr'
INFO:pypeflow.simple_pwatcher_bridge:(slept for another 0.0s -- another 1 loop iterations)
INFO:pypeflow.simple_pwatcher_bridge:(slept for another 0.30000000000000004s -- another 2 loop iterations)
ERROR:pypeflow.simple_pwatcher_bridge:Task Node(lbc0001--lbc0001/0-seqdb) failed with exit-code=1
ERROR:pypeflow.simple_pwatcher_bridge:Some tasks are recently_done but not satisfied: {Node(lbc0001--lbc0001/0-seqdb)}
ERROR:pypeflow.simple_pwatcher_bridge:ready: set()
        submitted: set()
ERROR:pwatcher.blocking:Noop. We cannot kill blocked threads. Hopefully, everything will die on SIGTERM.
Traceback (most recent call last):
  File "/bin/pg_run.py", line 4, in <module>
    __import__('pkg_resources').run_script('peregrine==0.1.5.3', 'pg_run.py')
  File "/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(code, namespace, namespace)
  File "/lib/python3.7/site-packages/peregrine-0.1.5.3-py3.7-linux-x86_64.egg/EGG-INFO/scripts/pg_run.py", line 651, in <module>
    main(args)
  File "/lib/python3.7/site-packages/peregrine-0.1.5.3-py3.7-linux-x86_64.egg/EGG-INFO/scripts/pg_run.py", line 566, in main
    read_db_abs_prefix, read_db = run_build_db(wf, args, seq_dataset_lst)
  File "/lib/python3.7/site-packages/peregrine-0.1.5.3-py3.7-linux-x86_64.egg/EGG-INFO/scripts/pg_run.py", line 216, in run_build_db
    wf.refreshTargets()
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/simple_pwatcher_bridge.py", line 278, in refreshTargets
    self._refreshTargets(updateFreq, exitOnFailure)
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/simple_pwatcher_bridge.py", line 362, in _refreshTargets
    raise Exception(msg)
Exception: Some tasks are recently_done but not satisfied: {Node(lbc0001--lbc0001/0-seqdb)}

Message shown inside the 0-seqdb STDERR

executable=${PYPEFLOW_JOB_START_SCRIPT}
+ executable=/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/run-Pfbd628d10ca7b2.bash
timeout=${PYPEFLOW_JOB_START_TIMEOUT:-60} # wait 60s by default
+ timeout=60

# Wait up to timeout seconds for the executable to become "executable",
# then exec.
#timeleft = int(timeout)
while [[ ! -x "${executable}" ]]; do
    if [[ "${timeout}" == "0" ]]; then
        echo "timed out waiting for (${executable})"
        exit 77
    fi
    echo "not executable: '${executable}', waiting ${timeout}s"
    sleep 1
    timeout=$((timeout-1))
done
+ [[ ! -x /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/run-Pfbd628d10ca7b2.bash ]]

/bin/bash ${executable}
+ /bin/bash /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/run-Pfbd628d10ca7b2.bash
+ '[' '!' -d /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb ']'
+ cd /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb
+ eval '/bin/bash run.sh'
++ /bin/bash run.sh
export PATH=$PATH:/bin
+ export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin
+ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin
cd /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb
+ cd /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb
/bin/bash task.sh
+ /bin/bash task.sh
pypeflow 2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476
2019-12-09 14:47:08,133 - root - DEBUG - Running "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/task.json"
2019-12-09 14:47:08,135 - root - DEBUG - Checking existence of '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/task.json' with timeout=30
2019-12-09 14:47:08,135 - root - DEBUG - Loading JSON from '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/task.json'
2019-12-09 14:47:08,136 - root - DEBUG - {'bash_template_fn': 'template.sh',
 'inputs': {'seq_dataset': '../../read.list'},
 'outputs': {'read_db': 'seq_dataset.seqdb', 'seqidx': 'seq_dataset.idx'},
 'parameters': {'pypeflow_mb': 4000,
                'pypeflow_nproc': 1,
                'read_db_prefix': '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/seq_dataset'}}
2019-12-09 14:47:08,136 - root - WARNING - CD: '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb' <- '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb'
2019-12-09 14:47:08,136 - root - DEBUG - Checking existence of '../../read.list' with timeout=30
2019-12-09 14:47:08,136 - root - DEBUG - Checking existence of 'template.sh' with timeout=30
2019-12-09 14:47:08,136 - root - WARNING - CD: '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb' <- '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb'
2019-12-09 14:47:08,137 - root - INFO - $('/bin/bash user_script.sh')
hostname
+ hostname
pwd
+ pwd
date
+ date
# Substitution will be similar to snakemake "shell".

/usr/bin/time shmr_mkseqdb     -p /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/seq_dataset     -d ../../read.list
+ /usr/bin/time shmr_mkseqdb -p /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/seq_dataset -d ../../read.list
user_script.sh: line 10: /usr/bin/time: No such file or directory
2019-12-09 14:47:08,153 - root - WARNING - Call '/bin/bash user_script.sh' returned 32512.
2019-12-09 14:47:08,154 - root - WARNING - CD: '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb' -> '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb'
2019-12-09 14:47:08,154 - root - WARNING - CD: '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb' -> '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb'
2019-12-09 14:47:08,155 - root - CRITICAL - Error in /lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py with args="{'json_fn': '/data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb/task.json',\n 'timeout': 30,\n 'tmpdir': None}"
Traceback (most recent call last):
  File "/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 267, in <module>
    main()
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 259, in main
    run(**vars(parsed_args))
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 253, in run
    run_cfg_in_tmpdir(cfg, tmpdir, '.')
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 228, in run_cfg_in_tmpdir
    run_bash(bash_template, myinputs, myoutputs, parameters)
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/do_task.py", line 187, in run_bash
    util.system(cmd)
  File "/lib/python3.7/site-packages/pypeflow-2.1.1+git.d63b0e79f5a7b2d370b7de84a890f88271afa476-py3.7.egg/pypeflow/io.py", line 29, in syscall
    raise Exception(msg)
Exception: Call '/bin/bash user_script.sh' returned 32512.
+++ pwd
++ echo 'FAILURE. Running top in /data/AnalysisWithsubReads/analysis/Peregrine/lbc0001--lbc0001/0-seqdb (If you see -terminal database is inaccessible- you are using the python bin-wrapper, so you will not get diagnostic info. No big deal. This process is crashing anyway.)'
++ rm -f top.txt
++ which python
++ which top
++ env -u LD_LIBRARY_PATH top -b -n 1
++ env -u LD_LIBRARY_PATH top -b -n 1
++ pstree -apl
env: 'top': No such file or directorytask.sh: line 10: pstree: command not found

real    0m0.242s
user    0m0.176s
sys 0m0.061s
+ finish
+ echo 'finish code: 1'

Hope the above messages are helpful to figure the issue and possible solution.

Many Thanks, Sudeep

cschin commented 4 years ago

it seems to me you are missing the command /usr/bin/time, you may need to install https://www.gnu.org/software/time/. Or, you can use the docker build.