Open DarioS opened 5 years ago
Hi Dario,
this would be a useful feature - I have plans to add a variety of QC / diagnostic outputs. However, the insert size for RNA-seq may is not as straightforward as for DNA, as there may be junctions in the unsequenced space between non-overlapping mates.
The "merging of overlapping mates" occurs only inside the STAR algorithm, in the output alignment the mates are "unmerged" again, so it should not affect the downstream tools?
Cheers Alex
I did not know that the merged reads are not output as single reads. It will be suitable for me to use existing tools. I suppose that you could set a threshold above which insert sizes are ignored. For example, it's unlikely that sequencing would work at all if the insert size was more than 1000 bases for Illumina RNA-seq. That would mostly avoid skewing the calculated size distribution.
Recent versions of STAR can merge overlapping pairs of reads into a single read for alignment. One drawback is that it makes it hard to calculate insert size with existing quality control tools which don't know about merged reads. Can STAR do the calculations (mean and standard deviation of insert size) and put them in the existing mapping metrics log file?