XiaoTaoWang / EagleC

A deep-learning framework for predicting a full range of structural variations from bulk and single-cell contact maps
Other
51 stars 8 forks source link

Application to dm6 #27

Open MarcoDiS opened 1 year ago

MarcoDiS commented 1 year ago

I am studying possible SVs in dm6 and I managed to do the entire analysis with default parameters at 5, 10, and 100kb. I was wondering if it is possible to polish my analysis by doing the following:

1 - I would like to filter out from the analysis all the peri-centromeric regions and I have them annotated. Is there a way to provide a list of regions that the software shouldn't consider? The other solution would be to artificially remove all the contacts in these regions from the input .cool file. Would this work?

2 - I used the default values for the probability cutoffs at 5, 10, and 50kb. Is there a way to optimize them or I should do some kind of manual search of the best parameters?

3 - Is there any reason a prior to give ICE or RAW maps as input, or the result should not depend on this choice?

Thanks for your time, Marco

XiaoTaoWang commented 1 year ago

Hi Marco,

Sorry for the late response.

  1. If you already have the coordinates of the peri-centromeric regions. I think the best way is to manually remove the SV predictions by EagleC that are located within or near those regions.
  2. There is always a tradeoff between specificity and sensitivity, you know.
  3. Based on my experience, ICE tends to yield fewer but more accurate predictions, whereas RAW produces a slightly higher number of predictions with lower accuracy. Again, if you want to detect as many genomic rearrangements as possible, I would recommend running EagleC with both ICE and RAW, and then combining the results by taking the union.

Best, Xiaotao