Closed kspham closed 10 years ago
yep, happened the same to me.
And how did you resume it? From tophat2 and skipping the trimming steps that has already been done?
Hi Son and Lorena,
Sorry for the trouble-- in the default minimal installation we weren't installing bowtie2. I added that to the installer here: 28417f2. Son, if you do:
bcbio_nextgen.py upgrade --aligners bowtie2
it should install the bowtie2 indices for GRCh37.
The pipeline will automatically pick up where it left off, so you should be all good after installing the missing bowtie2 index.
Thanks for the report!
Very strange $bcbio_nextgen.py upgrade --aligners bowtie2 usage: bcbio_nextgen.py upgrade [-h] [--tooldir TOOLDIR] [--tools] [-u {stable,development,system,skip}] [--toolplus {protected,data}] [--genomes {GRCh37,hg19,mm10,mm9,rn5,canFam3}] [--aligners {bowtie,bowtie2,bwa,novoalign,star,ucsc}] [--data] [--nosudo] [--isolate] [--tooldist {minimal,full}] [--distribution {ubuntu,debian,centos,scientificlinux,macosx}]
optional arguments: -h, --help show this help message and exit --tooldir TOOLDIR Directory to install 3rd party software tools. Leave unspecified for no tools --tools Boolean argument specifying upgrade of tools. Uses previously saved install directory -u {stable,development,system,skip}, --upgrade {stable,development,system,skip} Code version to upgrade --toolplus {protected,data} Specify additional tool categories to install --genomes {GRCh37,hg19,mm10,mm9,rn5,canFam3} Genomes to download --aligners {bowtie,bowtie2,bwa,novoalign,star,ucsc} Aligner indexes to download --data Upgrade data dependencies --nosudo Specify we cannot use sudo for commands --isolate Created an isolated installation without PATH updates --tooldist {minimal,full} Type of tool distribution to install. Defaults to a minimum install. --distribution {ubuntu,debian,centos,scientificlinux,macosx} Operating system distribution
On Fri, Jan 31, 2014 at 11:33 AM, Rory Kirchner notifications@github.comwrote:
Hi Son and Lorena,
Sorry for the trouble-- in the default minimal installation we weren't installing bowtie2. I added that to the installer here: 28417f2https://github.com/chapmanb/bcbio-nextgen/commit/28417f2. Son, if you do:
bcbio_nextgen.py upgrade --aligners bowtie2
it should install the bowtie2 indices for GRCh37.
The pipeline will automatically pick up where it left off, so you should be all good after installing the missing bowtie2 index.
Thanks for the report!
Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbio-nextgen/issues/283#issuecomment-33834460 .
Oops. How about:
bcbio_nextgen.py upgrade --aligners bowtie2 --genomes GRCh37 --data
It works. But the pipeline doesn't automatically pick up where it left off! It's unzipping the read files again
On Fri, Jan 31, 2014 at 12:07 PM, Rory Kirchner notifications@github.comwrote:
Oops. How about:
bcbio_nextgen.py upgrade --aligners bowtie2 --genomes GRCh37 --data
Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbio-nextgen/issues/283#issuecomment-33837581 .
Hi Son,
@porterjamesj just fixed that issue here: https://github.com/chapmanb/bcbio-nextgen/pull/270. If you upgrade to the development version you should pull in his patch:
bcbio_nextgen.py upgrade -u development
The upgrade helps somehow (bypass the unzipping stage) but it seems that it doesn't use the trimmed reads but uses the original fastq files :(
On Fri, Jan 31, 2014 at 1:59 PM, Rory Kirchner notifications@github.comwrote:
Hi Son,
@porterjamesj https://github.com/porterjamesj just fixed that issue here: #270 https://github.com/chapmanb/bcbio-nextgen/pull/270. If you upgrade to the development version you should pull in his patch:
bcbio_nextgen.py upgrade -u development
Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbio-nextgen/issues/283#issuecomment-33846780 .
Hi Son,
I'm really sorry to make you run around like this. It looks like the trimming got accidentally dropped from the RNA-seq pipeline in the development version; we've been retooling some of the infrastructure and I missed this one. I restored it here: a1ed4064b2d46d1e. I reopened the issue, could you let me know if it ends up running okay? If you run the upgrade again you will get the fix.
Works -- Please close!
On Fri, Jan 31, 2014 at 2:52 PM, Rory Kirchner notifications@github.comwrote:
Hi Son,
I'm really sorry to make you run around like this. It looks like the trimming got accidentally dropped from the RNA-seq pipeline in the development version; we've been retooling some of the infrastructure and I missed this one. I restored it here: a1ed406https://github.com/chapmanb/bcbio-nextgen/commit/a1ed4064b2d46d1e. I reopened the issue, could you let me know if it ends up running okay? If you run the upgrade again you will get the fix.
Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbio-nextgen/issues/283#issuecomment-33850881 .
Hi All, I have followed the instructions above, but still get the error message:
$ /data/aminData/bcbio-nextgen/anaconda/bin/python bcbio_nextgen.py /data/aminData/bcbio-nextgen/galaxy/bcbio_system.yaml data/aminData/example/rnaseqExample/config/smallRNA.yaml
[2014-03-13 11:19] Using input YAML configuration: /data/aminData/example/rnaseqExample/config/smallRNA.yaml
[2014-03-13 11:19] Checking sample YAML configuration: /data/aminData/example/rnaseqExample/config/smallRNA.yaml
[2014-03-13 11:19] Testing minimum versions of installed programs
[2014-03-13 11:19] Resource requests: picard; memory: 2.5; cores: 1
[2014-03-13 11:19] Configuring 1 jobs to run, using 1 cores each with 2.8g of memory reserved for each job
[2014-03-13 11:19] run local -- checkpoint passed: trimming
[2014-03-13 11:19] multiprocessing: process_lane
[2014-03-13 11:19] Preparing 1_070113_control_experiment_small_COLO
[2014-03-13 11:19] multiprocessing: trim_lane
[2014-03-13 11:19] Trimming low quality ends and read through adapter sequence from /data/aminData/example/rnaseqExample/input/small_COLO_R1.fastq, /data/aminData/example/rnaseqExample/input/small_COLO_R2.fastq.
[2014-03-13 11:19] Resource requests: tophat2; memory: 1.0; cores: 16
[2014-03-13 11:19] Configuring 1 jobs to run, using 1 cores each with 1.2g of memory reserved for each job
[2014-03-13 11:19] multiprocessing: process_alignment
[2014-03-13 11:19] Aligning lane 1_070113_control_experiment_small_COLO with tophat2 aligner
Traceback (most recent call last):
File "/data/aminData/bcbio-nextgen/anaconda/bin/bcbio_nextgen.py", line 59, in
Hi Amin,
Sorry for the trouble. Hmmm. The value of ref_file there should look like this (with the trailing GRCh37):
/data/aminData/bcbio-nextgen/genomes/Hsapiens/GRCh37/bowtie2/GRCh37
In your /data/aminData/bcbio-nextgen/galaxy/tool-data/bowtie2_indicies.loc is the entry for GRCh37 missing the trailing GRCh37?
Hi Roy, Thanks a million for your reply. For some reason my bowtie2_indicies.loc had two entries:
$ more bowtie2_indices.loc GRCh37 GRCh37 Human (GRCh37) /data/aminData/bcbio-nextgen/genomes/Hsapiens/GRCh37/bowtie2 GRCh37 GRCh37 Human (GRCh37) /data/aminData/bcbio-nextgen/genomes/Hsapiens/GRCh37/bowtie2/GRCh37
Probabely the first one bu automated installer, the second one when I upgrade? Removing the first one solved the problem. Thankx -A
Great Amin, glad to hear it. I'm not sure where that second one came from. Hmm.
Oops: After the fix above, it went quite far but failed:
[2014-03-13 23:03]
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 59, in
Hi Amin,
Bummer-- it looks like the Python library pandas isn't installed. If you bcbio_nextgen.py upgrade -u development does it resolve the issue?
Was bcbio-nextgen installed with the installer? It looks like the bcbio-nextgen that is getting called is in the systemwide python directory.
Thanks a lot for the report!
Thanks a lot Roy. It worked. Can you please also point me to how can I reordersam? Now it complains that:
File "/data/aminData/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 117, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; java -jar -Xms750m -Xmx20g /data/aminData/tools/share/java/RNA-SeQC/RNA-SeQC_v1.1.7.jar -n 1000 -s /data/aminData/example/rnaseqExample/work/qc/Control_rep1_COLO/tx/tmpH1ln9i/rnaseqc/sample_file.txt -t /data/aminData/bcbio-nextgen/genomes/Hsapiens/GRCh37/rnaseq/ref-transcripts.gtf -r /data/aminData/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -o /data/aminData/example/rnaseqExample/work/qc/Control_rep1_COLO/tx/tmpH1ln9i/rnaseqc -BWArRNA /data/aminData/bcbio-nextgen/genomes/Hsapiens/GRCh37/rnaseq/rRNA.fa -bwa /data/aminData/tools/bin/bwa -ttype 2 RNA-SeQC v1.1.7 05/14/12 Retriving contig names from reference contig names in reference: 84 Loading GTF for Read Counting Converting to refGene Transcript objects to RefGen format: 2 s Running IntronicExpressionReadBlock Walker .... Arguments: [-T, IntronicExpressionReadBlock, --outfile_metrics, /data/aminData/example/rnaseqExample/work/qc/Control_rep1_COLO/tx/tmpH1ln9i/rnaseqc/Control_rep1_COLO/Control_rep1_COLO.metrics.tmp.txt, -R, /data/aminData/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa, -I, /data/aminData/example/rnaseqExample/work/align/Control_rep1_COLO/1_070113_control_experiment_small_COLO_tophat/1_070113_control_experiment_small_COLO.bam, -refseq, /data/aminData/example/rnaseqExample/work/qc/Control_rep1_COLO/tx/tmpH1ln9i/rnaseqc/refGene.txt, -l, ERROR] org.broadinstitute.sting.utils.exceptions.UserException$LexicographicallySortedSequenceDictionary: Lexicographically sorted human genome sequence detected in reads. For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs. This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files. You can use the ReorderSam utility to fix this problem: http://www.broadinstitute.org/gsa/wiki/index.php/ReorderSam reads contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 3, 4, 5, 6, 7, 8, 9, GL000191.1, GL000192.1, GL000193.1, GL000194.1, GL000195.1, GL000196.1, GL000197.1, GL000198.1, GL000199.1, GL000200.1, GL000201.1, GL000202.1, GL000203.1, GL000204.1, GL000205.1, GL000206.1, GL000207.1, GL000208.1, GL000209.1, GL000210.1, GL000211.1, GL000212.1, GL000213.1, GL000214.1, GL000215.1, GL000216.1, GL000217.1, GL000218.1, GL000219.1, GL000220.1, GL000221.1, GL000222.1, GL000223.1, GL000224.1, GL000225.1, GL000226.1, GL000227.1, GL000228.1, GL000229.1, GL000230.1, GL000231.1, GL000232.1, GL000233.1, GL000234.1, GL000235.1, GL000236.1, GL000237.1, GL000238.1, GL000239.1, GL000240.1, GL000241.1, GL000242.1, GL000243.1, GL000244.1, GL000245.1, GL000246.1, GL000247.1, GL000248.1, GL000249.1, MT, X, Y] at org.broadinstitute.sting.utils.SequenceDictionaryUtils.validateDictionaries(SequenceDictionaryUtils.java:128) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.validateSourcesAgainstReference(GenomeAnalysisEngine.java:730) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:809) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:672) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:227) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.cga.rnaseq.gatk.GATKTools.runIntronReadCount(GATKTools.java:226) at org.broadinstitute.cga.rnaseq.ReadCountMetrics.runRegionCounting(ReadCountMetrics.java:243) at org.broadinstitute.cga.rnaseq.ReadCountMetrics.runReadCountMetrics(ReadCountMetrics.java:58) at org.broadinstitute.cga.rnaseq.RNASeqMetrics.runMetrics(RNASeqMetrics.java:220) at org.broadinstitute.cga.rnaseq.RNASeqMetrics.execute(RNASeqMetrics.java:166) at org.broadinstitute.cga.rnaseq.RNASeqMetrics.main(RNASeqMetrics.java:135) RNA-SeQC Total Runtime: 0 min ' returned non-zero exit status 3
Hi Amin,
Sorry for all of the issues. Did you use Tophat to align the samples? I fixed that bug here: 18dea9adb0850cb, so if you update your bcbio_nextgen installation to the development version it should fix that if you used Tophat.
bcbio_nextgen.py upgrade -u development
Hi Roy, Thanks a lot. It works.
Great, thanks for following up! Let us know if you run into any more issues.
ValueError: Cannot detect which reference version /usr/local/share/bcbio-nextgen/genomes/Hsapiens/GRCh37/bowtie2/GRCh37 is. Should end in either .ebwt (bowtie) or .bt2 (bowtie2).
It's clear that the default installation forgot to download the bowtie indices. S.