XiaoTaoWang / EagleC

A deep-learning framework for predicting a full range of structural variations from bulk and single-cell contact maps
Other
52 stars 8 forks source link

Capture HiC data #20

Open yfarjoun opened 1 year ago

yfarjoun commented 1 year ago

Hello @XiaoTaoWang.

Thanks for maintaining such an organized site for installing and running EagleC.

I noticed that in the paper it is noted that eaglec can run on Capture HiC data but I didn't see any detailed instructions about how to actually do this. In particular, how does one get around the fact that capture data has a particular pattern due to the capture technology? does the normalization help with that? if so, which normalization should be done? CNV or ICE? (as an aside, what does "ICE" stand for?)

Related: what methods/scripts/functions did you use to evaluate performance? clearly there are many ways to compare a call-set to a truth set and the details matter, so I was wondering if you have the evaluation scripts made publicly available?

Thanks!

Yossi

XiaoTaoWang commented 1 year ago

Dear Yossi,

Thank you for your interest. First of all, ICE refers to "Iterative Correction and Eigenvector decomposition", a Hi-C data normalization method developed by Dr. Leonid A Mirny's lab in 2012 (DOI: 10.1038/nmeth.2148).

In our paper, we assessed the performance of EagleC on several region capture Hi-C datasets, where we knew the actual SV coordinates, and we found that EagleC accurately predicted them in all cases (achieving 100% recall), with no other pixels on the capture Hi-C maps being identified as SVs (achieving 100% precision). Since region capture Hi-C is essentially high-resolution Hi-C in local regions, it is reasonable to use the same Hi-C guidelines to predict SVs on these contact maps, i.e., predictions should be combined from 5kb, 10kb, and 50kb resolutions, and both raw and normalized matrices can be used (although sensitivity and specificity may differ for different normalization methods).

However, EagleC has not yet been optimized for promoter capture Hi-C (or any capture Hi-C that enriches discrete loci/elements in the genome). And based on our limited tests, ICE normalization should be used to ensure reasonable accuracy on these platforms.

I hope this information is helpful.

Best, Xiaotao

yfarjoun commented 1 year ago

Thanks for the references and the information!

clolalan7 commented 1 year ago

considering that KR is similar to ICE, would it be possible to add this normalization? That way data from pipelines producing .hic files could be used without having to re-normalize to have Hi-C?

Thank you for considering,