MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

possibility to increase the 50 multi-mapping treshold from bowtie with the last version ? #121

Closed arnauddr37 closed 1 year ago

arnauddr37 commented 1 year ago

Hello, I recently started to use shortstack and it seems in the last version, the possibility to increase the multimapping in bowtie is gone :

Eliminate option --bowtie_m. Now -k 50 is always used.

I am working on smallRNA from maize and with the high number of repetitive regions in this genome (in which I am currently interested), I end up with a lot of smallRNA unmapped because of this limit :

Processing and sorting alignments Working on /usr/users/.../data/shortstack/fromFastafile/FR5/collapsed_trimmed_FR5_Ves_15_L1_1_umi_trimmed_readsorted.sam.gz Working on /usr/users/.../data/shortstack/fromFastafile/FR5/collapsed_trimmed_FR5_Ves_57_L1_1_umi_trimmed_readsorted.sam.gz Working on /usr/users/.../data/shortstack/fromFastafile/FR5/collapsed_trimmed_FR5_Ves_45_L1_1_umi_trimmed_readsorted.sam.gz Summary of primary alignments: XY:Z:N -- Unmapped because no valid alignments: 1275970 / 10126678 (12.6 %) XY:Z:M -- Unmapped because alignment number exceeded option bowtie_m 50: 5649088 / 10126678 (55.8 %) XY:Z:O -- Unmapped because alignment number exceeded option ranmax 3 and no guidance was possible: 105456 / 10126678 (1.0 %) XY:Z:U -- Uniquely mapped: 1308100 / 10126678 (12.9 %) XY:Z:R -- Multi-mapped with primary alignment chosen randomly: 99653 / 10126678 (1.0 %) XY:Z:P -- Multi-mapped with primary alignment chosen based on u: 1688411 / 10126678 (16.7 %)

Is there still a way to modify this parameter in the current version or do I need to retrieve an older one?
Is there a specific reason for deleting this parameter, e.g. more than 50 mapping positions are meaningful for the shortstack pipeline?

MikeAxtell commented 1 year ago

Yes, I deliberately made that change because in my internal testing, once you get above 50 possible locations, there's no difference in placing a primary alignment vs. just random guessing.

ShortStack will accept user-generated BAM files as input (via option --bamfile). So, if you are unsatisfied with the constraints of ShortStack's built-in bowtie wrapper, you can always make your own alignments with whichever aligner / settings you choose, and then run a ShortStack analysis with your own alignment(s).

That said the repeats of course can be tricky to interpret. Have you considered aligning your sRNA-seq data to repeat consensus sequences instead of the whole genome?

arnauddr37 commented 1 year ago

Then there is no advantage in retrieving an older version. Your suggestions are part of what I am currently trying. Thank you for the answer and the help.