arq5x / bedtools

A powerful toolset for genome arithmetic.
http://code.google.com/p/bedtools/
GNU General Public License v2.0
140 stars 85 forks source link

bedtools fisher - how the possible intervals estimated? #162

Open ohdongha opened 3 years ago

ohdongha commented 3 years ago

Hi, thanks again for providing this amazing toolkit.

The fisher command appears useful for checking the significance of overlap very quickly. But I wonder whether there is a bit more detailed explanation on how the number of all possible intervals is estimated. Is there a reference we could cite on the estimation of all possible intervals for the fisher test?

Reading through previous issues on fisher, I can see that the authors have been considering the option of deprecating this command (but survived so far).

Will there be alternative ways to test the significance of overlap (or avoidance) based on permutation? For example, the reldist command could compare the distribution of relative distances (between center positions?) for the input pair and a randomized pair (created by shuffle?) as the manual page showed. Is there a way to derive a p-value by comparing the input pair with a large number of randomized interval pairs? Or by comparing the jaccard score of the input sets (this time between true intervals, not center positions?) with those of a large number of randomized interval pairs?

Cheers, Dong-Ha