Closed pgonzale60 closed 6 years ago
Thanks Pablo. It's great to hear that you've found ShortStack useful and easy to use.
You are absolutely right; collapsing reads would for sure speed mapping, and there are ways, like the one you pointed out, to integrate collapsed reads into the density hash used for placements. Believe me, I've considered this before. But there are several other issues:
This is not to say it couldn't be done; it could, and as I write this I can think of work-arounds for both of the issues above .. for issue 1 I could output a both a 'reduced' BAM file and a 'full' BAM file. For issue 2 I would have to buckle down and re-work all of the relevant code (there's a lot of it, actually). But do-able.
I'll mark this as an enhancement, and when I have time sometime I will play with it.
Thanks again.
I'm going over old issues that are still open, including this one. After thinking about it I am marking this as a 'wontfix' for now. I really do want to keep the bam files that ShortStack creates and operates on standard.
Hi Mike,
I've been using ShortStack in a couple of projects and it has been a very useful and easy to use tool in order to assign multimapping reads and to annotate non-coding regions with small RNA-seq data. However, I think it could be improved in matter of resources by reducing redundancy from the input reads. There are tools like tally that summarize the reads. I think ShortStack can make use of that by only altering the way it counts the density, taking into account that a read is actually a representative of x number of reads.
Fore example, I ran tally. It gave me fasta files with identifiers >trn${uniqueID}${numberOfReadsItRepresents}. Then, I substitute
++$$density{$key}
in: `else {A unique mapper
}`
to
my ($number) = $read_bucket[0] =~ /.*trn_\d+_(\d+).*/; $$density{$key} += $number;
Bests, Pablo