XiaoTaoWang / EagleC

A deep-learning framework for predicting a full range of structural variations from bulk and single-cell contact maps
Other
52 stars 8 forks source link

EagleC detecting too few SVs #22

Closed nyuhic closed 1 year ago

nyuhic commented 1 year ago

Hello,

I am using EagleC to call SVs on some HiChIP data for cancer cell lines. For most of my samples it is detecting <50 SVs. WGS data for the some of these samples, however, shows >1000 SVs. I am starting my analysis with HiC-pro valid pair files (with ~ 75M pairs each), which I convert to cool, balance and input to EagleC. I am running everything with default parameters. I was wondering if you have any comments or things I can try?

Thanks

XiaoTaoWang commented 1 year ago

In my experience, detecting less than 50 structural variants (SVs) from Hi-C data can be reasonable.

There are two main reasons why Hi-C typically detects fewer SVs than WGS. Firstly, many of the SVs detected by WGS are smaller than 10kb, which is beyond the resolution of Hi-C (by default, EagleC combines SV calls from 5kb, 10kb, and 50kb resolutions). Secondly, not all SVs detected by WGS cause abnormal chromatin interactions around the breakpoints. As a result, some WGS calls cannot be confirmed by Hi-C and may be false positives.

nyuhic commented 1 year ago

Thanks for the clarification!