PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

running unzip using raw or preads assembly failing at track_reads.sh #119

Closed rob123king closed 6 years ago

rob123king commented 6 years ago

I have completed a raw and pread assembly using falcon. For falcon unzip it is failing for both.

Starting from preads assembly: fc_unzip.py creates the "read_maps" folder and folder contents in "2-asm-falcon", including the file "pread_to_contigs" but seems to fail at the track_reads part. Any ideas what is going wrong here. The falcon stage of the assembly went fine and tested out pread length changes using preads as input.

I can't see why it is failing from the error I have the below:

before Read_Overlap record j = 8 out of 1858188 at /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/m_00025/preads.25.las Mon Apr 30 13:33:15 2018
before Read_Overlap record j = 9 out of 1858188 at /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/m_00025/preads.25.las Mon Apr 30 13:33:15 2018
before Read_Overlap record j = 1000000 out of 1858188 at /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/m_00025/preads.25.las Mon Apr 30 13:33:18 2018

completed loop record j = 1858188 out of 1858188 at /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/m_00025/preads.25.las Mon Apr 30 13:33:20 2018

[106453]maxrss:   363224
[106453]finished run_tr_stage1('/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/preads.db', '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/m_00043/preads.43.las', 2500, 40, dict(64521 elem))
[106450]maxrss:   497508
[106450]finished run_tr_stage1('/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/preads.db', '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/m_00029/preads.29.las', 2500, 40, dict(64521 elem))
[106451]maxrss:   370412
[106451]finished run_tr_stage1('/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/preads.db', '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/1-preads_ovl/m_00025/preads.25.las', 2500, 40, dict(64521 elem))
[106447]finished track_reads
#mkdir -p 3-unzip/reads/
python -m falcon_kit.mains.fetch_reads
+ python -m falcon_kit.mains.fetch_reads
Traceback (most recent call last):
  File "/home/apps/python/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/apps/python/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/falcon_kit/mains/fetch_reads.py", line 153, in <module>
    main()
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/falcon_kit/mains/fetch_reads.py", line 149, in main
    fetch_ref_and_reads(**vars(args))
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/falcon_kit/mains/fetch_reads.py", line 70, in fetch_ref_and_reads
    assert read_set, 'Empty read_set. Maybe empty {!}?'.format(map_fn)
ValueError: end of format while looking for conversion specifier
touch /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads/track_reads_done.exit
+ touch /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads/track_reads_done.exit
2018-04-30 13:35:11,011 - root - DEBUG - CD: '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads' -> '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads'
2018-04-30 13:35:11,023 - root - DEBUG - CD: '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads' -> '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads'
2018-04-30 13:35:11,026 - root - CRITICAL - Error in /home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py with args="{'json_fn': '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads/task.json',\n 'timeout': 60,\n 'tmpdir': None}"
Traceback (most recent call last):
  File "/home/apps/python/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/apps/python/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py", line 217, in <module>
    main()
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py", line 209, in main
    run(**vars(parsed_args))
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py", line 203, in run
    run_cfg_in_tmpdir(cfg, tmpdir)
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py", line 176, in run_cfg_in_tmpdir
    run_python(python_function_name, myinputs, myoutputs, parameters)
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py", line 107, in run_python
    run_python_func(func, myinputs, myoutputs, parameters)
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py", line 102, in run_python_func
    do_support.run_bash(script_fn)
  File "/home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_support.py", line 51, in run_bash
    raise Exception('{} <- {!r}'.format(rc, cmd))
Exception: 256 <- '/bin/bash -vex /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads/track_reads.sh'

Config files and locations (although temp directory use is not being picked up)


[General]
job_type = local

[Unzip] smrt_bin=/home/data/bioinf_resources/programming_tools/falcontest/bin/

input_type = preads input_fofn= input.fofn input_bam_fofn= input_bam.fofn

unzip_concurrent_jobs = 14 quiver_concurrent_jobs = 14

export TMPDIR=/das_data/falcon

use_tmpdir=true


> bam input file

/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig-35pM_01.bam /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig-35pM_02.bam /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig-35pM_03.bam /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig-35pM_04.bam /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig-A_titration_25pM.bam /home/data/pisa_ngs/Backup_Genetics/PacBio/Fig-A_titration_35pM.bam


> preads input file

/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7/1-preads_ovl/db2falcon/preads4falcon.fasta.outfile.fasta

rob123king commented 6 years ago

I have tried adding this to the jsn as thought it might be tmpdir was not set but gave an error tmpdir": "/das_data",

2018-04-30 13:53:51,575 - root - CRITICAL - Error in /home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py with args="{'json_fn': '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads/task.json',\n 'timeout': 60,\n 'tmpdir': None}"

rob123king commented 6 years ago

corrected task.sh with this instead but made no difference. Seems like Jason has something wrong so maybe just configuration?

{ "inputs": { "falcon_asm_done": "/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/2-asm-falcon/falcon_asm_done" }, "outputs": { "ctg_list_file": "ctg_list", "job_done": "track_reads_done" }, "parameters": { "config": { "job_queue": "default", "job_type": "local", "pwatcher_type": "fs_based", "sge_blasr_aln": " -pe smp 24 -q bigmem ", "sge_hasm": " -pe smp 48 -q bigmem", "sge_phasing": " -pe smp 12 -q bigmem", "sge_track_reads": " -pe smp 12 -q bigmem", "smrt_bin": "/home/data/bioinf_resources/programming_tools/falcontest/bin/", "unzip_blasr_concurrent_jobs": 8, "unzip_phasing_concurrent_jobs": 8 }, "sge_option": " -pe smp 12 -q bigmem", "wd": "/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads" }, "python_function": "falcon_unzip.unzip.task_track_reads" }

2018-04-30 15:21:36,543 - root - CRITICAL - Error in /home/data/bioinf_resources/programming_tools/falcontest/lib/python2.7/site-packages/pypeflow-1.0.0-py2.7.egg/pypeflow/do_task.py with args="{'json_fn': '/home/data/pisa_ngs/Backup_Genetics/PacBio/Fig_Falcon7s/3-unzip/reads/task.json',\n 'timeout': 60,\n 'tmpdir': '/das_data'}"

rob123king commented 6 years ago

ok so linking the 0-reads folder from a raw assembly run solved this problem.