MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

ranmax 1: avoid choosing primary alignment randomly at all #90

Closed bbermudez closed 1 year ago

bbermudez commented 5 years ago

Dear Mike

I find the u and f mmap assignments very useful, however, I would like to avoid assigning a multi-mapping read to a place where there’s no possible guidance, that is to avoid reporting R flag reads. I do want to keep P flag reads, these are those that were assigned based on the location of unique mapping reads and therefore using mmap n would not be an option as this would discard all multi mapping reads.

I had a look at the README and I think that the ranmax parameter may be helpful for me: "Reads with more than this number of possible alignment positions where the choice can't be guided by unequal will be reported as unmapped. Irrelevant if option mmap is set to n or r. Must be integer of 2 or greater or set to 'none' to disable. Default: 3”

I used the following genome and a single read as test data. Here, there are two possible mapping sites for a single read. As we are dealing with a single read no unique mapping reads guidance is possible.

genome TTGCATGCGGTGGCTCCGCACACGTTTGATTGGAACGTGGTCAATGGTAGAAACCCCCCG AGCGCTAGAGCACTCTGGCTCTACCGTTCGGTTGACACGTTCAGGTTCGGTGGGACCGGC TGGCTCCGCACACGTTTGAT

read1 TGGCTCCGCACACGTTTGAT

I tried to set ranmax to 1 to avoid assigning it to either location randomly, but as the instructions say this value must be an integer of 2 or greater and I get an error message.

I would like this read to be marked as O flag - unmapped because alignment number exceeded option ranmax x and no guidance was possible, but it was instead marked with R flag Multi-mapped with primary alignment chosen randomly

Is there a way to achieve such functionality?

I also tried using parameters -mmap u and -ranmax ’none’, but the read keeps getting assigned randomly

Thank you for your time

Cheers, Beto Bermudez

MikeAxtell commented 3 years ago

Another one of your comments that fell through the cracks, sorry! I will look into this as I work on the next major release.

MikeAxtell commented 1 year ago

Yes the alignment philosophy of Shortstack is that reads will be placed somewhere if they match more than once. Indeed for the new release (currently in alpha testing, on the 'ShortStack4' branch on github) I went the opposite direction as you suggest. Now every read that aligns somewhere gets placed. The very repetitive ones and the no guidance ones are tagged as such and part of the alignment.

If you wish to exclude them you can filter the BAM file to remove alignments with the properties that you wish to exclude. Of course you can also perform alignments outside of ShortStack .. ShortStack will accept any BAM file(s) as input so long as it meets the BAM spec.