Closed RafaelMViana closed 8 years ago
Thanks. We haven't seen that bug before. The 'N7' failure is an internal failure to get anything out of the RNAfold call for a locus. It shouldn't happen.
Can you send me the Log.txt file from this failed run and I will start from there...
Can you test this command on the system you are using?
echo AAAAAAAAAAAAAACCCCCCUUUUUUUUUUUUUUUUUU | RNAfold --noPS > test_stdout.txt 2> test_stderr.txt
What do you see in the test_stdout.txt and test_stderr.txt files? You should see the RNAfold result in the stdout, and nothing in the stderr. If not, then we have isolated the problem.
Hi Mike, It happened as you told:
stderr: empty
stdout:
AAAAAAAAAAAAAACCCCCCUUUUUUUUUUUUUUUUUU ((((((((((((((......)))))))))))))).... ( -6.60)
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Friday, September 4, 2015 2:56 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
Can you test this command on the system you are using?
echo AAAAAAAAAAAAAACCCCCCUUUUUUUUUUUUUUUUUU | RNAfold --noPS > test_stdout.txt 2> test_stderr.txt
What do you see in the test_stdout.txt and test_stderr.txt files? You should see the RNAfold result in the stdout, and nothing in the stderr. If not, then we have isolated the problem.
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-137728933.
Here you have the logs from 3 different samples
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Friday, September 4, 2015 2:53 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
Thanks. We haven't seen that bug before. The 'N7' failure is an internal failure to get anything out of the RNAfold call for a locus. It shouldn't happen.
Can you send me the Log.txt file from this failed run and I will start from there...
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-137728438.
ShortStack version 3.0 Thu Aug 20 17:54:53 CEST 2015 hostname: vbsg-hennig5 working directory: /media/diskc
Settings: RNAfold_version: 2 bowtie_cores: 20 bowtie_m: 50 dicermax: 28 dicermin: 17 genomefile: /media/diskc/genomes/spruce/full_spruce/bowtie1_index/full_spruce.fa logfile: /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_1/Log.txt mincov: 2 mismatches: 1 mmap: f nostitch: 1 outdir: /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_1 pad: 75 ranmax: 10 readfile: /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/Spruce1_size_17-28_tRNA_rRNA_Picea_clean.fastq
Run Progress and Messages:
Thu Aug 20 17:54:53 CEST 2015 Starting alignment of /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/Spruce1_size_17-28_tRNA_rRNA_Picea_clean.fastq with bowtie command: bowtie -q -v 1 -p 20 -S -a -m 50 --best --strata --sam-RG ID:Spruce1_size_17-28_tRNA_rRNA_Picea_clean /media/diskc/genomes/spruce/full_spruce/bowtie1_index/full_spruce - < /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/Spruce1_size_17-28_tRNA_rRNA_Picea_clean.fastq Completed. Results are in temporary file /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_1/Spruce1_size_17-28_tRNA_rRNA_Picea_clean_readsorted.sam.gz pending final processing Unique mappers: 937891 / 3561423 (26.3 %) Multi mappers: 1699957 / 3561423 (47.7 %) Multi mappers ignored and marked as unmapped: 119240 / 3561423 (3.3 %) Non mappers: 804335 / 3561423 (22.6 %)
Thu Aug 20 18:02:20 CEST 2015 Processing and sorting alignments Working on /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_1/Spruce1_size_17-28_tRNA_rRNA_Picea_clean_readsorted.sam.gz Summary of primary alignments: XY:Z:N -- Unmapped because no valid alignments: 804335 / 3561423 (22.6 %) XY:Z:M -- Unmapped because alignment number exceeded option bowtie_m 50: 119240 / 3561423 (3.3 %) XY:Z:O -- Unmapped because alignment number exceeded option ranmax 10 and no guidance was possible: 3783 / 3561423 (0.1 %) XY:Z:U -- Uniquely mapped: 937891 / 3561423 (26.3 %) XY:Z:R -- Multi-mapped with primary alignment chosen randomly: 51046 / 3561423 (1.4 %) XY:Z:P -- Multi-mapped with primary alignment chosen based on f: 1645128 / 3561423 (46.2 %) Alignment completed. Final bam file is /media/diskc/Spruce-sRNA/reads/spruce_1/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_1/Spruce1_size_17-28_tRNA_rRNA_Picea_clean.bam
Thu Aug 20 18:12:07 CEST 2015 Performing de-novo cluster identification and analyses.
Sun Aug 23 21:21:57 CEST 2015 Tally of loci by predominant RNA size (DicerCall):
DicerCall NotMIRNA MIRNA N or NA 0 0 17 14172 0 18 10157 0 19 8280 0 20 7068 0 21 14423 0 22 4186 0 23 3630 0 24 5613 0 25 2342 0 26 1732 0 27 1386 0 28 981 0
ShortStack version 3.0 Sun Aug 23 21:22:15 CEST 2015 hostname: vbsg-hennig5 working directory: /media/diskc
Settings: RNAfold_version: 2 bowtie_cores: 20 bowtie_m: 50 dicermax: 28 dicermin: 17 genomefile: /media/diskc/genomes/spruce/full_spruce/bowtie1_index/full_spruce.fa logfile: /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_2/Log.txt mincov: 2 mismatches: 1 mmap: f nostitch: 1 outdir: /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_2 pad: 75 ranmax: 10 readfile: /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/Spruce2_size_17-28_tRNA_rRNA_Picea_clean.fastq
Run Progress and Messages:
Sun Aug 23 21:22:15 CEST 2015 Starting alignment of /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/Spruce2_size_17-28_tRNA_rRNA_Picea_clean.fastq with bowtie command: bowtie -q -v 1 -p 20 -S -a -m 50 --best --strata --sam-RG ID:Spruce2_size_17-28_tRNA_rRNA_Picea_clean /media/diskc/genomes/spruce/full_spruce/bowtie1_index/full_spruce - < /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/Spruce2_size_17-28_tRNA_rRNA_Picea_clean.fastq Completed. Results are in temporary file /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_2/Spruce2_size_17-28_tRNA_rRNA_Picea_clean_readsorted.sam.gz pending final processing Unique mappers: 1322228 / 4760173 (27.8 %) Multi mappers: 2464144 / 4760173 (51.8 %) Multi mappers ignored and marked as unmapped: 195901 / 4760173 (4.1 %) Non mappers: 777900 / 4760173 (16.3 %)
Sun Aug 23 21:41:29 CEST 2015 Processing and sorting alignments Working on /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_2/Spruce2_size_17-28_tRNA_rRNA_Picea_clean_readsorted.sam.gz Summary of primary alignments: XY:Z:N -- Unmapped because no valid alignments: 777900 / 4760173 (16.3 %) XY:Z:M -- Unmapped because alignment number exceeded option bowtie_m 50: 195901 / 4760173 (4.1 %) XY:Z:O -- Unmapped because alignment number exceeded option ranmax 10 and no guidance was possible: 3397 / 4760173 (0.1 %) XY:Z:U -- Uniquely mapped: 1322228 / 4760173 (27.8 %) XY:Z:R -- Multi-mapped with primary alignment chosen randomly: 87098 / 4760173 (1.8 %) XY:Z:P -- Multi-mapped with primary alignment chosen based on f: 2373649 / 4760173 (49.9 %) Alignment completed. Final bam file is /media/diskc/Spruce-sRNA/reads/spruce_2/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_2/Spruce2_size_17-28_tRNA_rRNA_Picea_clean.bam
Sun Aug 23 21:54:26 CEST 2015 Performing de-novo cluster identification and analyses.
Thu Aug 27 17:15:38 CEST 2015 Tally of loci by predominant RNA size (DicerCall):
DicerCall NotMIRNA MIRNA N or NA 0 0 17 18994 0 18 13310 0 19 10929 0 20 9307 0 21 17572 0 22 5762 0 23 5167 0 24 10441 0 25 2891 0 26 1940 0 27 1226 0 28 489 0
ShortStack version 3.0 Thu Aug 27 17:16:00 CEST 2015 hostname: vbsg-hennig5 working directory: /media/diskc
Settings: RNAfold_version: 2 bowtie_cores: 20 bowtie_m: 50 dicermax: 28 dicermin: 17 genomefile: /media/diskc/genomes/spruce/full_spruce/bowtie1_index/full_spruce.fa logfile: /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_3/Log.txt mincov: 2 mismatches: 1 mmap: f nostitch: 1 outdir: /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_3 pad: 75 ranmax: 10 readfile: /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/Spruce3_size_17-28_tRNA_rRNA_Picea_clean.fastq
Run Progress and Messages:
Thu Aug 27 17:16:00 CEST 2015 Starting alignment of /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/Spruce3_size_17-28_tRNA_rRNA_Picea_clean.fastq with bowtie command: bowtie -q -v 1 -p 20 -S -a -m 50 --best --strata --sam-RG ID:Spruce3_size_17-28_tRNA_rRNA_Picea_clean /media/diskc/genomes/spruce/full_spruce/bowtie1_index/full_spruce - < /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/Spruce3_size_17-28_tRNA_rRNA_Picea_clean.fastq Completed. Results are in temporary file /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_3/Spruce3_size_17-28_tRNA_rRNA_Picea_clean_readsorted.sam.gz pending final processing Unique mappers: 1201737 / 4131601 (29.1 %) Multi mappers: 2056696 / 4131601 (49.8 %) Multi mappers ignored and marked as unmapped: 296566 / 4131601 (7.2 %) Non mappers: 576602 / 4131601 (14.0 %)
Thu Aug 27 17:24:35 CEST 2015 Processing and sorting alignments Working on /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_3/Spruce3_size_17-28_tRNA_rRNA_Picea_clean_readsorted.sam.gz Summary of primary alignments: XY:Z:N -- Unmapped because no valid alignments: 576602 / 4131601 (14.0 %) XY:Z:M -- Unmapped because alignment number exceeded option bowtie_m 50: 296566 / 4131601 (7.2 %) XY:Z:O -- Unmapped because alignment number exceeded option ranmax 10 and no guidance was possible: 3563 / 4131601 (0.1 %) XY:Z:U -- Uniquely mapped: 1201737 / 4131601 (29.1 %) XY:Z:R -- Multi-mapped with primary alignment chosen randomly: 47010 / 4131601 (1.1 %) XY:Z:P -- Multi-mapped with primary alignment chosen based on f: 2006123 / 4131601 (48.6 %) Alignment completed. Final bam file is /media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_3/Spruce3_size_17-28_tRNA_rRNA_Picea_clean.bam
Thu Aug 27 17:35:04 CEST 2015 Performing de-novo cluster identification and analyses.
Sun Aug 30 21:56:56 CEST 2015 Tally of loci by predominant RNA size (DicerCall):
DicerCall NotMIRNA MIRNA N or NA 0 0 17 14124 0 18 10069 0 19 8367 0 20 7556 0 21 17028 0 22 5499 0 23 5059 0 24 7204 0 25 3173 0 26 2067 0 27 895 0 28 334 0
Darn.
Are you able to share one of the alignment files (for instance, media/diskc/Spruce-sRNA/reads/spruce_3/rRNA_filtered/local/tRNA_filtered/map_full/ShortStack_spruce_3/Spruce3_size_17-28_tRNA_rRNA_Picea_clean.bam) along with the genome (/media/diskc/genomes/spruce/full_spruce/bowtie1_index/full_spruce.fa) with me (you could use dropbox of google drive) .. that way I can here on my own system to try and reproduce and isolate this issue.
Since we've ruled out problem problem with RNAfold I think it must be that somehow no genomic intervals are being retrieved.
Hi again. Can you paste the contents of the genome index (the .fai file)?
The spruce genome is quite huge
The .fasta is 12.7 GB The .fai is 421.9 MB
The first 10 lines of the .fai look like this
MA_20601_len=208095 208095 21 80 81 MA_20603_len=198197 198197 210739 80 81 MA_159042_len=194057_putative_contaminant 194057 411457 80 81 MA_20595_len=168074 168074 607961 80 81 MA_20424_len=165328 165328 778157 80 81 MA_19111_len=163407 163407 945573 80 81 MA_20598_len=162097 162097 1111044 80 81 MA_10435771_len=161784 161784 1275192 80 81 MA_18863_len=161531 161531 1439020 80 81 MA_10432209_len=161390 161390 1602595 80 81
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Friday, September 4, 2015 3:41 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
Hi again. Can you paste the contents of the genome index (the .fai file)?
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-137740992.
Yowza that is big.
Can you test on your system whether the samtools faidx command can do sequence retrieval? I wonder about chromosome names with equal signs .. whether that might be causing an error.
Example (assuming the genome is indexed)
samtools faidx genome.fasta MA_20603_len=198197:1000-1500
You should get back a fasta chunk corresponding to positions 1000-1500 of chromosome MA_20603_len=198197
If you don't, then we've isolated the problem.
Yes I get it
MA_20603_len=198197:1000-1500 GGGTGCTAAAGACTTAATATAAAATATTCTCTTTAGCCGACTTTACTTCTGGGGTTGTGT CCAAGTATTTTCTCGCTTCCGAAATATTCTAATCTCATCAAAGGTCACAAATGGAGGTTA ATACCTCCCGCAACTACCATAATATCAACACTCCCTCCTAGCTAGGGAGGTATAAAAGGT AATGGTCTTGAGCACAAACAAATCATTACAAACAAATCATTCGACTGGTAGCATCATCAT AGCCACACATGCTATGTCATGATCAAATCATGAGCACCGTCAATATTATCCAACTGGATA ATCTCCATGACAATTAGTACAATGTCATATGATAGAACACAACTGTGTCTATCCATAGTA GTACATAACTACTATCCAGTGGTACACAACCACAACAAGTAGCACACGACTACTTAATCA TTGACACATAGTCACAACCAGTGGAACACATCCACCACGGTAACATACATCTACTATCCA GAGACATACAGTCTCCATAGC
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Friday, September 4, 2015 3:54 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
Yowza that is big.
Can you test on your system whether the samtools faidx command can do sequence retrieval? I wonder about chromosome names with equal signs .. whether that might be causing an error.
Example (assuming the genome is indexed)
samtools faidx genome.fasta MA_20603_len=198197:1000-1500
You should get back a fasta chunk corresponding to positions 1000-1500 of chromosome MA_20603_len=198197
If you don't, then we've isolated the problem.
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-137743625.
Thanks. So I am out ideas here. Can you send a bam file and the associated genome so I can test locally? My regular email is mja18@psu.edu
Thanks Mike,
I have to ask my boss about sending the whole .bams
On the other hand, I am trying to extract from the ban an interesting contig, MA_42375, that contains a known mir, pab-miR947a. This miR is present in 11 clusters, and fails folding with status N7
when I run grep MA42375 full_spruce_fa.fai
it outputs: MA_42375_len=60015
when I run (having the proper .bai) samtools view -h Spruce1_size_17-28_tRNA_rRNA_Picea_clean.bam MA_42375_len=60015 > MA_42375.sam
it outputs: [main_samview] region "MA_42375_len=60015" specifies an unknown reference name. Continue anyway.
I do not know if this helps
Kind regards
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Friday, September 4, 2015 4:49 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
Thanks. So I am out ideas here. Can you send a bam file and the associated genome so I can test locally? My regular email is mja18@psu.edumailto:mja18@psu.edu
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-137757295.
That's suspicious. samtools view is unable to use indexed lookup and that is a problem.
Can you paste what this looks like:
samtools view Spruce1_size_17-28_tRNA_rRNA_Picea_clean.bam | head
I am wondering whether bowtie didn't like your chromosome names, and modified them, or whether samtools view doesn't like them, and wont find them....
it looks like the contig names got truncated
HWI-ST344:303:C3NY3ACXX:8:1304:14575:73574 272 MA_20601 7005 255 17M * 0 0 * * XA:i:0 MD:Z:17 NM:i:0 XX:i:45 XY:Z:P XZ:f:0.002 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:2212:9199:93468 272 MA_20601 9376 255 24M * 0 0 * * XA:i:1 MD:Z:7G16 NM:i:1 XX:i:33 XY:Z:P XZ:f:0.029 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:2312:9349:59266 272 MA_20601 9664 255 24M * 0 0 * * XA:i:1 MD:Z:13C10 NM:i:1 XX:i:9 XY:Z:P XZ:f:0.091 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:1111:8518:63546 256 MA_20601 11403 255 17M * 0 0 * * XA:i:1 MD:Z:11T5 NM:i:1 XX:i:29 XY:Z:P XZ:f:0.024 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:2112:13431:39600 272 MA_20601 14862 255 17M * 0 0 * * XA:i:1 MD:Z:2A14 NM:i:1 XX:i:32 XY:Z:P XZ:f:0.03 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:1316:11250:11607 256 MA_20601 19149 255 24M * 0 0 * * XA:i:0 MD:Z:24 NM:i:0 XX:i:10 XY:Z:P XZ:f:0.083 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:2102:1802:75900 272 MA_20601 23262 255 17M * 0 0 * * XA:i:1 MD:Z:3T13 NM:i:1 XX:i:7 XY:Z:P XZ:f:0.125 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:2116:18452:25763 272 MA_20601 29583 255 17M * 0 0 * * XA:i:1 MD:Z:10T6 NM:i:1 XX:i:35 XY:Z:P XZ:f:0.009 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:1207:10687:16830 256 MA_20601 35679 255 20M * 0 0 * * XA:i:1 MD:Z:3A16 NM:i:1 XX:i:15 XY:Z:P XZ:f:0.06 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean HWI-ST344:303:C3NY3ACXX:8:2116:19224:48573 272 MA_20601 39333 255 21M * 0 0 * * XA:i:0 MD:Z:21 NM:i:0 XX:i:35 XY:Z:P XZ:f:0.027 RG:Z:Spruce1_size_17-28_tRNA_rRNA_Picea_clean
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Friday, September 4, 2015 5:33 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
That's suspicious. samtools view is unable to use indexed lookup and that is a problem.
Can you paste what this looks like:
samtools view Spruce1_size_17-28_tRNA_rRNA_Picea_clean.bam | head
I am wondering whether bowtie didn't like your chromosome names, and modified them, or whether samtools view doesn't like them, and wont find them....
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-137769854.
So it's not clear to me why your contig names were truncated. I did a test where I used similar chromosome names as yours and ran a ShortStack analysis, as follows:
$ cat test_genome.fasta
MA_20601_len=208095 AGCTAGCATACGACCTACAGGACTACGGACCAG MA_20603_len=198197 AGCTCAACGATTCAGCATCAGCATCAGACGCACAGCACAGCATGTAC MA_159042_len=194057_putative_contaminant CGACTCAGCATCGGGGCGGCTATCATTTCTATCTTTA
cat test_reads.fasta
read1 GCATACGACCTACAGGACTA read2 CTCAGCATCGGGGCGGCTATCAT
$ ShortStack --readfile test_reads.fasta --genomefile test_genome.fasta --align_only
$ samtools view -h test_reads.bam @HD VN:1.0 SO:coordinate @SQ SN:MA_20601_len=208095 LN:33 @SQ SN:MA_20603_len=198197 LN:47 @SQ SN:MA_159042_len=194057_putative_contaminant LN:37 @RG ID:test_reads @PG ID:Bowtie VN:0.12.9 CL:"bowtie -f -v 1 -p 1 -S -a -m 50 --best --strata --sam-RG ID:test_reads test_genome -" @PG ID:ShortStack VN:3.3 CL:"--readfile test_reads.fasta --genomefile test_genome.fasta --align_only" read1 0 MA_20601_len=208095 6 255 20M * 0 0 GCATACGACCTACAGGACTA IIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:20 NM:i:RG:Z:test_reads XX:i:1 XY:Z:U XZ:f:1 read2 0 MA_159042_len=194057_putative_contaminant 4 255 23M * 0 0 CTCAGCATCGGGGCGGCTATCAT IIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:23 NM:i:0 RG:Z:test_reads XX:i:1 XY:Z:U XZ:f:1
.. As you can see, the format of your contig names did not cause bowtie, samtools, or ShortStack to truncate the contig names on my system.
I'm using ShortStack 3.3, but I don't think your use of 3.0 would matter. Other versions:
bowtie and bowtie-build 0.12.9 (same as you) samtools 1.2 (same as you).
So I really am stuck here. I bet the truncated chromosome names in your bam file are the issue. I just don't know how they are being produced, since when I use similar names in testing here, I don't get that result.
I guess maybe you could try editing the names of your contigs (substitute = with _ perhaps) and re-run .. maybe there is some difference in our systems...
I have edited the contig names in the .fa, created new .fai and rerun with the same parameters as the previous run. I will tell you about the output when is finished.
Thanks a lot for your help
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Friday, September 4, 2015 7:38 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
I guess maybe you could try editing the names of your contigs (substitute = with _ perhaps) and re-run .. maybe there is some difference in our systems...
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-137801698.
Hi Rafael, Just following up on this issue. Were you able to resolve it?
Dear Mike,
Thanks for your interest, I am currently analyzing ShortStack output yes, it worked and i got miRNAs This time the log looked like this N3 N5 N6 N8 N10 N11 N12 N13 N14 N15 Yes sample_1 47,836 1,824 2,674 19 5 16,064 2,377 1,649 414 1,199 42 sample_2 65,413 2,131 4,166 14 2 19,541 3,036 1,977 414 1,409 49 sample_3 52,175 2,025 5,514 15 6 15,997 2,600 1,694 343 1,194 47
most of the miRNAs were new. I checked my mapped reads vs mirbase to look for False negatives (FN) and got:
FN TP new total
sample_1 24 1 41 66
sample_2 27 2 47 76 sample_3 27 1 47 75
TP are known miRNAs detected by SStack FN are known miRNAs not detected by SStack new are unknown miRNAs detected by SStack
I have defined a cluster with at least 2 reads, and some new detected miRNAs come for clusters with 4,5,6 or lss than 10 reads in cluster
reads in clusters detected as new miRNA sorted by min to max
sample_1 4 6 19 19 26 42 46 54 66 89 sample_2 16 17 18 24 30 59 61 75 77 90 sample_3 4 5 6 7 8 8 8 8 19 21
on the other hand I have FN with more than 1000 reads in the cluster
Reads in clusters with FN miRNA sorted by min to max
2 3 4 5 6 7 8 9 10 >10 >100 >1000 total
sample_1 3 2 1 1 0 0 0 1 2 15 12 11 24 sample_2 3 5 1 1 0 0 0 1 0 18 14 11 27 sample_3 3 2 3 1 2 0 1 0 1 15 13 11 27
I am currently looking for homology of the new species in mir-base
I hope that this can add something for future software improvement Any comment is appreciated
Thanks again for your interest Kind regards
############################
Dr. Rafael M. Viana
Uppsala Biocenter, Almas Allé 5 Department of Plant Biology / Växtbiologi Swedish University of Agricultural Sciences
75651 Uppsala Sweden
############################
From: Mike Axtell notifications@github.com Sent: Monday, October 5, 2015 3:33 PM To: MikeAxtell/ShortStack Cc: Rafael Muñoz Subject: Re: [ShortStack] RNA folding always fails at step7 (#25)
Hi Rafael, Just following up on this issue. Were you able to resolve it?
Reply to this email directly or view it on GitHubhttps://github.com/MikeAxtell/ShortStack/issues/25#issuecomment-145528839.
Thanks.
I would be cautious about calling previously annotated false negatives in this analysis. Certainly some of them are, but others may have been poor annotations to begin with. anyway, it seems this issue is resolved. Thanks for your comments.
Dear Mike, First of all, thank you for your ShortStack program. I really like all the output information that it provides. my issue is about the folding step. All the clusters analyzed for folding fail and no one goes further than step 7
sample N3 N5 N6 N7 total clusters
Sample1 47,743 1,853 2,650 21,724 73,970 Sample2 65,174 2,115 4,255 26,484 98,028 Sample3 52,013 1,976 5,433 21,953 81,375
It is indicated in the manual that failure at N7 can be (possible bug?). I wonder if it is a bug from shortstack, a problem with one of the versions of the programs i am using to support shortstack, or a problem with my samples and genome. I have checked some output sequences in mirbase, and I have found previously reported miRNAs. I know that ShortStack is designed to avoid FP, but it looks suspicious to me that the folding always fails in the same step
I am working with the the big fragmented genome of Norway Spruce (Picea Abies) which has round 10e6 contigs
I Trimmed my reads and selected sizes between 17-28nt. After that, I filtered against rRNA and tRNA, Then, I used shortstack for mapping and sRNA analysis
program versions:
ShortStack v3.0 perl v5.18.2 samtools v1.2 bowtie v0.12.9 bowtie-build v0.12.9 gzip v1.6 RNAfold v2.1.0
mapping output
Sample1 2,637,848 (74.0) 923,575 (26.0) Sample2 3,786,372 (79.6) 973,801 (20.4) Sample3 3,258,433 (78.9) 873,168 (21.2)
ShortStack Settings
bowtie_m: 50 dicermax: 28 dicermin: 17 mincov: 2 mismatches: 1 mmap: f nostitch: 1 pad: 75 ranmax: 10
I would also appreciate your advice for any downstream analysis of such data
Thanks in advance for your time and advice