Open gpratt opened 10 years ago
minimally replicating example?
There is a notebook with the details
I think this is the source of the difference: Coverage counts all overlapping reads. We count reads with their (start OR middle OR stop) in the region. Thus, with back-to-back clusters, reads are unambiguously assigned.
You wrote the cython version of the function that does this...
Thats exactly whats going on. I think its optimal. But might make people doing independent validations difficult. Although with iCLIP we want to choose the read start as the location to count. I may swap out our counting system with htseq, its a lot easier to work with.
Thoughts?
htseq is fine if it has the same behavior
CoverageBed counts more reads overlapping with peaks than the number of peaks overlapping reported by clippers internal count.
I'll Investigate this @mlovci is there something with the read counting I'm forgetting that would cause the number of reads to be under reported in clipper, possibly with starting position or something.