Closed dwaggott closed 9 years ago
Daryl;
Sorry about the issue. bwa is complaining because it can't find the pre-created bwa indices for the genome. The command line does look strange, as it points to the seq
directory in the bwa mem
call:
/srv/gs1/projects/ashley/apps/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37
where it should be pointing to:
/srv/gs1/projects/ashley/apps/bcbio/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa
Is it possible something changed with the installed galaxy *.loc files in /srv/gs1/projects/ashley/apps/bcbio/galaxy/tool-data/bwa_index.loc
so they point to the wrong location?
I did end up running --genomes GRCh37
a second time after I got a complaint of missing files. I assumed they failed to properly install. If I accidently used bcbio_nextgen_install.py
instead of bcbio_nextgen.py upgrade
would it overwrite the loc file directory and ultimately ruin the install?
I only see loc files for sam, picard and gatk.
Upgrading a few tools including bwa to see if it sorts out.
Daryl;
Running bcbio_nextgen_install.py
against an existing installation should work, although I'm confused as to what is going on. Do you have a bwa directory of indices in /srv/gs1/projects/ashley/apps/bcbio/genomes/Hsapiens/GRCh37/bwa/
? Is it possible you didn't add --aligners bwa
to the install/upgrade command line? I'm not sure why you wouldn't have a bwa-based .loc
file beyond that. Hope a re-run of the data install adding bwa fixes it.
Upgrading adding the aligners fixed it.
On Thu, Apr 2, 2015 at 7:26 AM, Brad Chapman notifications@github.com wrote:
Daryl; Running bcbio_nextgen_install.py against an existing installation should work, although I'm confused as to what is going on. Do you have a bwa directory of indices in /srv/gs1/projects/ashley/apps/bcbio/genomes/Hsapiens/GRCh37/bwa/? Is it possible you didn't add --aligners bwa to the install/upgrade command line? I'm not sure why you wouldn't have a bwa-based .loc file beyond that. Hope a re-run of the data install adding bwa fixes it.
— Reply to this email directly or view it on GitHub https://github.com/chapmanb/bcbio-nextgen/issues/809#issuecomment-88928106 .
Wait, it was running but looks like a similar error.
I couldn't find the referenced bwa index. The upgrade didn't report a pure *.fa being unpacked..
$ ll /srv/gs1/projects/ashley/apps/bcbio/genomes/Hsapiens/GRCh37/bwa/
total 5.1G
-rw-rw-r-- 1 dwaggott scgpm-informatics_ashley 2.9G Mar 20 2013 GRCh37.fa.bwt
-rw-rw-r-- 1 dwaggott scgpm-informatics_ashley 740M Mar 20 2013 GRCh37.fa.pac
-rw-rw-r-- 1 dwaggott scgpm-informatics_ashley 6.7K Mar 20 2013 GRCh37.fa.ann
-rw-rw-r-- 1 dwaggott scgpm-informatics_ashley 6.5K Mar 20 2013 GRCh37.fa.amb
-rw-rw-r-- 1 dwaggott scgpm-informatics_ashley 1.5G Mar 20 2013 GRCh37.fa.sa
[2015-04-02T14:31Z] scg1-3-5.local: Uncaught exception occurred
Traceback (most recent call last):
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
_do_run(cmd, checks, log_stdout)
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /home/dwaggott/ashley1/apps/bin/bwa mem -M -t 12 -R '@RG\tID:3\tPL:illumina\tPU:3_2015-03-28_tk_gatk_joint\tSM:LP6005692-DNA_C03' -v 1 /srv/gs1/projects/ashley/apps/bcbio/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa /srv/gs1/projects/ashley/tk/results/bcbio/tk_gatk_joint/work/align_prep/LP6005692-DNA_C03-1.fq.gz /srv/gs1/projects/ashley/tk/results/bcbio/tk_gatk_joint/work/align_prep/LP6005692-DNA_C03-2.fq.gz | /home/dwaggott/ashley1/apps/bin/samblaster --splitterFile >(/home/dwaggott/ashley1/apps/bin/samtools sort -@ 12 -m 682M -T /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort-sorttmp-spl -o /home/dwaggott/scratch/bcbiotx/f9481993-8b95-4843-9b32-e8864e97ebac/tmpS3E_DY/3_2015-03-28_tk_gatk_joint-sort-sr.bam /dev/stdin) --discordantFile >(/home/dwaggott/ashley1/apps/bin/samtools sort -@ 12 -m 682M -T /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort-sorttmp-disc -o /home/dwaggott/scratch/bcbiotx/76b5038d-8be9-4630-918c-48ef051a85d1/tmptLQHpM/3_2015-03-28_tk_gatk_joint-sort-disc.bam /dev/stdin) | samtools view -b -S -u - | sambamba sort -t 12 -m 682M --tmpdir /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort-sorttmp-full -o /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort.bam /dev/stdin
[M::mem_pestat] mean and std.dev: (2749.66, 966.65)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9933)
[M::mem_pestat] skip orientation FF
...
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7550)
[M::mem_pestat] mean and std.dev: (2863.50, 981.85)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9334)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
samblaster: Can't find first and/or second of pair in sam block of length 1 for id: HS2000-9101_162:8:2114:15015:27353
samblaster: Are you sure the input is sorted by read ids?
' returned non-zero exit status 1
[2015-04-02T14:31Z] scg1-3-5.local: Unexpected error
Traceback (most recent call last):
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/ipythontasks.py", line 38, in _setup_logging
yield config
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/ipythontasks.py", line 79, in process_alignment
return ipython.zip_args(apply(sample.process_alignment, *args))
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/sample.py", line 98, in process_alignment
data = align_to_sort_bam(fastq1, fastq2, aligner, data)
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 64, in align_to_sort_bam
names, align_dir, data)
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 98, in _align_from_fastq
out = align_fn(fastq1, fastq2, align_ref, names, align_dir, data)
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 110, in align_pipe
names, rg_info, data)
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 128, in _align_mem
[do.file_nonempty(tx_out_file), do.file_reasonable_size(tx_out_file, fastq_file)])
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
_do_run(cmd, checks, log_stdout)
File "/home/dwaggott/ashley1/apps/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /home/dwaggott/ashley1/apps/bin/bwa mem -M -t 12 -R '@RG\tID:3\tPL:illumina\tPU:3_2015-03-28_tk_gatk_joint\tSM:LP6005692-DNA_C03' -v 1 /srv/gs1/projects/ashley/apps/bcbio/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa /srv/gs1/projects/ashley/tk/results/bcbio/tk_gatk_joint/work/align_prep/LP6005692-DNA_C03-1.fq.gz /srv/gs1/projects/ashley/tk/results/bcbio/tk_gatk_joint/work/align_prep/LP6005692-DNA_C03-2.fq.gz | /home/dwaggott/ashley1/apps/bin/samblaster --splitterFile >(/home/dwaggott/ashley1/apps/bin/samtools sort -@ 12 -m 682M -T /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort-sorttmp-spl -o /home/dwaggott/scratch/bcbiotx/f9481993-8b95-4843-9b32-e8864e97ebac/tmpS3E_DY/3_2015-03-28_tk_gatk_joint-sort-sr.bam /dev/stdin) --discordantFile >(/home/dwaggott/ashley1/apps/bin/samtools sort -@ 12 -m 682M -T /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort-sorttmp-disc -o /home/dwaggott/scratch/bcbiotx/76b5038d-8be9-4630-918c-48ef051a85d1/tmptLQHpM/3_2015-03-28_tk_gatk_joint-sort-disc.bam /dev/stdin) | samtools view -b -S -u - | sambamba sort -t 12 -m 682M --tmpdir /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort-sorttmp-full -o /home/dwaggott/scratch/bcbiotx/86be74d6-919f-4450-8aa1-cf0138a67a11/tmpUmyeiQ/3_2015-03-28_tk_gatk_joint-sort.bam /dev/stdin
[M::mem_pestat] mean and std.dev: (2749.66, 966.65)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9933)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
Daryl;
Your bwa directory looks right, there is no *.fa
file, it just serves as the base name to find the index files. The command also looks right but it appears to be dying prematurely leading to the error from samblaster. I can't diagnose from this error log but there are probably other errors upstream of this that might be causitive if you want to dig into it more. You could also run in single core mode and post the log as a gist and I might be able to identify the issue. Sorry about the problems but hope this helps.
Please re-open if you still run into issues. You should also be able to close your own issues if they are resolved.
Latest code. Joint gatk pipeline on n=4 whole genomes using sge cluster. Scanning the qacct it looks like memory was fine.