YeoLab / clipper

A tool to identify CLIP-seq peaks
Other
64 stars 41 forks source link

CLIPper count num reads strange #21

Open gpratt opened 10 years ago

gpratt commented 10 years ago

CoverageBed counts more reads overlapping with peaks than the number of peaks overlapping reported by clippers internal count.

I'll Investigate this @mlovci is there something with the read counting I'm forgetting that would cause the number of reads to be under reported in clipper, possibly with starting position or something.

mlovci commented 10 years ago

minimally replicating example?

gpratt commented 10 years ago

http://nbviewer.ipython.org/github/gpratt/iPython_Notebook/blob/master/CLIPper%20Read%20Counting%20Error.ipynb

There is a notebook with the details

mlovci commented 10 years ago

I think this is the source of the difference: Coverage counts all overlapping reads. We count reads with their (start OR middle OR stop) in the region. Thus, with back-to-back clusters, reads are unambiguously assigned.

You wrote the cython version of the function that does this...

gpratt commented 10 years ago

Thats exactly whats going on. I think its optimal. But might make people doing independent validations difficult. Although with iCLIP we want to choose the read start as the location to count. I may swap out our counting system with htseq, its a lot easier to work with.

Thoughts?

mlovci commented 10 years ago

htseq is fine if it has the same behavior