alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

No Insert Size Metrics in Log File #579

Open DarioS opened 5 years ago

DarioS commented 5 years ago

Recent versions of STAR can merge overlapping pairs of reads into a single read for alignment. One drawback is that it makes it hard to calculate insert size with existing quality control tools which don't know about merged reads. Can STAR do the calculations (mean and standard deviation of insert size) and put them in the existing mapping metrics log file?

alexdobin commented 5 years ago

Hi Dario,

this would be a useful feature - I have plans to add a variety of QC / diagnostic outputs. However, the insert size for RNA-seq may is not as straightforward as for DNA, as there may be junctions in the unsequenced space between non-overlapping mates.

The "merging of overlapping mates" occurs only inside the STAR algorithm, in the output alignment the mates are "unmerged" again, so it should not affect the downstream tools?

Cheers Alex

DarioS commented 5 years ago

I did not know that the merged reads are not output as single reads. It will be suitable for me to use existing tools. I suppose that you could set a threshold above which insert sizes are ignored. For example, it's unlikely that sequencing would work at all if the insert size was more than 1000 bases for Illumina RNA-seq. That would mostly avoid skewing the calculated size distribution.