biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

How does EPIC work ? #88

Open nservant opened 5 years ago

nservant commented 5 years ago

Hi, A short question about the EPIC output file. For some samples, it seems that EPIC is working pretty well. However, for others (same histone marks), almost all regions are significant ... So I try to set up my own filter on from the results.out file.

Here is an example ; Chromosome Start End ChIP Input Score Log2FC P 1 chr1 29600 77199 682 1024 2693.9908 0.3391119 1.478957e-09 2 chr1 78200 86199 63 77 373.0064 0.6359773 4.019591e-04 3 chr1 437000 474999 457 635 2127.5436 0.4509215 8.612973e-11 4 chr1 2480800 2494399 96 114 676.9330 0.6775564 7.038296e-06 5 chr1 2606400 2631799 217 256 1367.6917 0.6870352 2.883622e-11 6 chr1 2746400 2863199 1544 2279 6962.8744 0.3637558 8.185924e-22

In the manual ; The log2_fold change is the number of ChIP reads divided by the number of Input reads in the region (where a pseudocount is computed for regions with no input-reads.) But for instance, in line 1, the input has more reads than the ChIP ... And I would therefore expect a Log2FC = log2(682/1024)=-0.586 which is not the case. Could you explain me why please ? Thanks

endrebak commented 5 years ago

I am probably not using raw counts, but RPKM. So since the relative number of reads is likely higher for input you have more ChIP than Input in that region, relatively speaking.

If you could share the data (with me, not the world), I could look into it if you think something might be off. :) Thanks for reporting :)

endrebak commented 5 years ago

If you find a large difference between SICER and epic I'd love to hear it, they are supposed to give nearly identical results (and guarantee the same ordering of the enriched regions), however the FDR-cutoff might differ slightly due to numerics.

nservant commented 5 years ago

Thanks for your feedbacks. I did not run SICER yet. However, I have another bigger issue. EPIC does not seem to be reproducible ! I run 3 times EPIC with the same parameters, same inputs, and I have completely different results ! Is it expected ? Of note, I'm running many samples in parallel ... is there any file written, where a sample could overwrite another one ? That's very strange ... Thanks

endrebak commented 5 years ago

That is very worrisome. I have a suite of unittests to ensure it works correctly. No, it runs independently.

I am working on a less memory-consuming version now. I can look into bugs. If you could compare with SICERpy, that would be great. And if you are able to reproduce an error please tell me.

I have been using epic to analyze the epigenome roadmap without problems, so my guess is that there isn’t a bug in epic, but that some combos of different versions of the libraries it depends on makes it work differently.

Would love to hear if you find a reproducible bug :)

On Thursday, September 20, 2018, Nicolas Servant notifications@github.com wrote:

Thanks for your feedbacks. I did not run SICER yet. However, I have another bigger issue. EPIC does not seem to be reproducible ! I run 3 times EPIC with the same parameters, same inputs, and I have completely different results ! Is it expected ? Of note, I'm running many samples in parallel ... is there any file written, where a sample could overwrite another one ? That's very strange ... Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biocore-ntnu/epic/issues/88#issuecomment-423324081, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9I0qrhzmbuRwk3HcFEtQPfgeQDFFUKks5uc_wHgaJpZM4Wxq7v .

nservant commented 5 years ago

Good news ! This was my fault. I wrote a small script that does the bam to bed conversion for both ChIP and control and run EPIC. But several of my samples have the same input, and when I run it in parallel the bam to bed control conversion of the different samples overwrite each other. Sorry for the mistake. I now fixed it, run it 4 times and have exactly the same results. Many thanks !!

endrebak commented 5 years ago

Ah, I have often done similar things and reported it as a bug. No worries.

Also, to prioritize results you can choose the 1k with the best FDR, for example :) Depends on what you want to do.

Also, I wrote epic because it seems like the best for H3K27me3, I have not tested it extensively on other histonetypes except PolII and H3K4me3, where Macs2 seemed like a better fit.

Macs2 claims to work on H3K27me3 and SICER on shorter histone marks, but I think it is bad advice. A cynical person might think it was to get more cites ;)