CollasLab / edd

Enriched Domain Detector for ChIP-seq data
https://pypi.python.org/pypi/edd
MIT License
16 stars 4 forks source link

question about reporting more domains #4

Closed steffenheyne closed 8 years ago

steffenheyne commented 9 years ago

Hi, I'm trying to call domains for broad histone marks like H3k27me3 and H3K9me3. It's working fine, but sometime the domains are a bit coarse and I'm currently playing around with the parameters to get a finer resolution/more of the peaks/domains. Currently I'm using bin_size=1kb gap_penalty=9 fdr=0.1 ci_lim=0.5. At least by eye all called domains make sense, but I also would like to see less significantly enriched domains get called. I tried lowering the fdr to even 0.15 or 0.2, but there is hardly any new peak region. I verified this is a genome browser (IGV).

the log for eg. a H3K9me3 ChIP says:

NOTICE: eddlib.algorithm.max_segments: 2619 intervals (potential peaks) remaining. [2015-09-03 12:33:09.295351] NOTICE: edd: Running 10000 monte carlo trials [2015-09-03 12:55:28.401540] NOTICE: eddlib.algorithm.max_segments: got 1769 peaks with qvalue below 0.15. From 2619 possible

How can I best increase the number of 2619 peaks? How can I get most of them reported? Probably by FDR?

There are also these "trivial intervals", what does this mean? NOTICE: eddlib.algorithm.max_segments: Removed trivial intervals with score less than 5.0565.

Thanks!

eivindgl commented 9 years ago

Hi,

I tried lowering the fdr to even 0.15 or 0.2 but this seems to crash then often due to a huge memory usage.

I don't think the fdr value should affect the memory usage, if it does it certainly is a strange bug. Do you run with many processes? try running with -p 1 and see if that helps? You can also reduce the number of monte carlo trials to decrease run times. I don't think you would see much difference in the result between 1000 and 10_000 (which is default).

How can I best increase the number of 2619 peaks?

It is not an easy (and correct) way to increase the number of potential peaks. I'll have think about this one a little more.

How can I get most of them reported?

You are correct, by FDR. Please tell me if you still have memory issues if you run EDD with fewer threads. How much memory do you have?