cerebis / qc3C

Reference-free quality assessment for Hi-C sequencing data
GNU Affero General Public License v3.0
12 stars 1 forks source link

Denominator in long-range fraction #28

Closed cerebis closed 4 years ago

cerebis commented 5 years ago

When calculating the fraction of pairs with long-range separation (d > 1kb, 5kb, 10kb), we must ensure that the denominator is consistent with the numerator.

Originally, we only considered intra-contig pairs, however we're now using a "greedy" method which also estimates separation for inter-contig pairs which meet a certain constraint. That constraint is that the location of one read of the pair must account for the entire separation represented by the bin. In effect, when estimating inter-contig separation, each contig has shoulder regions (of the bin size) which we ignore.

The count of pairs which becomes the denominator should also meet this constraint, not just "all pairs which map".

As it stands, our fractions will be slightly lower as the denominator is "all pairs which mapped".

cerebis commented 4 years ago

This has been resolved by taking only the simple case of cis-mapped pairs.