MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

only one read mapped #97

Closed DiegoZavallo closed 1 year ago

DiegoZavallo commented 4 years ago

Hi Felix, I have a soybean sRNA-Seq data which I already used in the past with ShortStack to detected miRNAs. Now I want to run it without --locifile so it create the sRNAloci gff3 file to later use it for differential expresion analysis. First I "cat" all the fastq files into one big file using only 21-22 and 24nt reads length.

And then run ShortStack

ShortStack --readfile 5-All_21-22-24nt.fq --genomefile Glycine_max.Glycine_max_v2.1.dna.chromosome.fa --bowtie_cores 14 --mismatches 1 --bowtie_m all --outdir sRNA_loci/

I set bowtie_m all so it mapped also reads that have more than 50 multimapping sites to create a comprehensive annotation file for sRNAloci as I have already asked: https://github.com/MikeAxtell/ShortStack/issues/76

It run all the way, but it seems that it only recongnize 1 read that mapped to the genome!!?

This is what I had:

ShortStack version 3.8.5 Fri Mar 6 09:29:07 EST 2020
hostname: vzresistomica working directory: /home/dzavallo/Soja

Settings: RNAfold_version: 2 bowtie_cores: 14 bowtie_m: all dicermax: 24 dicermin: 20 error_logfile: sRNA_loci/ErrorLogs.txt foldsize: 300 genomefile: Glycine_max.Glycine_max_v2.1.dna.chromosome.fa logfile: sRNA_loci/Log.txt mincov: 0.5rpm mismatches: 1 mmap: u outdir: sRNA_loci pad: 75 ranmax: 3 readfile: 5-All_21-22-24nt.fq
sort_mem: 768M strand_cutoff: 0.8

Run Progress and Messages:

Fri Mar 6 09:29:07 EST 2020
Expected fai file Glycine_max.Glycine_max_v2.1.dna.chromosome.fa.fai not found. Attempting to create using samtools faidx ... Successful

Fri Mar 6 09:29:15 EST 2020
Expected bowtie indices not found. Attempting to make them with bowtie-build... Successful

Fri Mar 6 09:29:15 EST 2020
Expected bowtie indices not found. Attempting to make them with bowtie-build... Successful

Fri Mar 6 10:18:10 EST 2020
Starting alignment of 5-All_21-22-24nt.fq with bowtie command: bowtie -q -v 1 -p 14 -S -a --best --strata --sam-RG ID:5-All_21-22-24nt ./Glycine_max.Glycine_max_v2.1.dna.chromosome - < 5-All_21-22-24nt.fq Completed. Results are in temporary file sRNA_loci/5-All_21-22-24nt_readsorted.sam.gz pending final processing Multi mappers: 1 / 1 (100.0 %)

Sat Mar 7 03:29:59 EST 2020
Processing and sorting alignments Working on sRNA_loci/5-All_21-22-24nt_readsorted.sam.gz Use of uninitialized value in split at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2632. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. Use of uninitialized value within @fields in join or string at /home/dzavallo/software/ShortStack-3.8.5/ShortStack line 2634. [W::sam_parse1] empty query name [E::sam_parse1] missing SAM header [W::sam_read1] Parse error at line 1 [main_samview] truncated file. Summary of primary alignments: XY:Z:N -- Unmapped because no valid alignments: 0 / 1 (0.0 %) XY:Z:M -- Unmapped because alignment number exceeded option bowtie_m all: 0 / 1 (0.0 %) XY:Z:O -- Unmapped because alignment number exceeded option ranmax 3 and no guidance was possible: 0 / 1 (0.0 %) XY:Z:U -- Uniquely mapped: 0 / 1 (0.0 %) XY:Z:R -- Multi-mapped with primary alignment chosen randomly: 1 / 1 (100.0 %) XY:Z:P -- Multi-mapped with primary alignment chosen based on u: 0 / 1 (0.0 %) Alignment completed. Final bam file is sRNA_loci/5-All_21-22-24nt.bam

Sat Mar 7 03:30:00 EST 2020
Performing de-novo cluster identification and analyses.

    At specified mincov of 0.5rpm with 1 primary reads,
    mincov is 1 raw alignments

Completed at Sat Mar 7 03:30:00 EST 2020

Performing search for unplaced small RNAs. Completed at Sat Mar 7 03:30:00 EST 2020

Sat Mar 7 03:30:00 EST 2020
Tally of loci by predominant RNA size (DicerCall):

DicerCall NotMIRNA MIRNA N or NA 0 0 20 0 0 21 0 0 22 0 0 23 0 0 24 0 0

Unplaced small RNAs Size <20 20 21 22 23 24

24

I googled these errors and they seem to have to do with the headers in the files.

Oddly, when I looked at the ErrorLog it appear to have mapped correctly with more than 95% of the read

I attached the ErrorLog.txt and the Log.txt so you can see it. Log.txt ErrorLogs.txt

the headers of my fastq file seems ok, and I downloaded again the Glycine max genome but the results were the same.

Any thoughts?

Best

Diego

MikeAxtell commented 3 years ago

So I'm sorry it took me so long to respond. I am just catching up with open issues now.

Did you ever figure this out?

One thing I can say is that I would not use this strategy. Specifically, I strongly discourage any pre-filtering of the raw reads. If you only align reads of a certain length, then degraded RNA bits (like tRNA and rRNA chunks) will not be obvious .. all you will have left are the random bits that happened to be 20-24 nts long.

Yes, something clearly broke in the alignment step.