broadinstitute / gatk-protected

Obsolete/Legacy GATK repository -- go to https://github.com/broadinstitute/gatk instead
BSD 3-Clause "New" or "Revised" License
33 stars 20 forks source link

Normalize coverage against the size of the target. #230

Closed LeeTL1220 closed 7 years ago

LeeTL1220 commented 8 years ago

We would insert a normalization step (right after pcov?) to normalize out the size of the target.

vruano commented 8 years ago

A natural way to do this is two divide the count by the length of the target. However notice that the number of overlapping reads is not proportional to the length of the target but to something else more like length of the target + 2 * read length that would correspond to the area overlapped by reads that overlap the target.

This should be clearly the case with very small targets. E.g. a 1bp target may have 100s of reads overlapped to it... increase the target size by 1bp (so 2bp) won't double the read count but just increase by a small marginal amount (~ average number of read starts per position).

davidbenjamin commented 7 years ago

@samuelklee This will also be moot when the new coverage model is in place. Close?