XiaoTaoWang / NeoLoopFinder

A computation framework for genome-wide detection of enhancer-hijacking events from chromatin interaction data in re-arranged genomes
Other
53 stars 16 forks source link

calculate-cnv runtime error #27

Open katreya42 opened 2 years ago

katreya42 commented 2 years ago

Hi there and thank you for this great tool.

I'm trying to run CNV-normalization on MNase-digested HiChIP samples that had been processed through HiC-Pro and JuicerTools to make hic maps.

Running 'hic2cool convert' and 'cooler balance' both complete fine (cool file generated is at 10kb resolution).

Setting the '--enzyme uniform' flag leads to a runtime error:

calculate-cnv --hic MG63_3_3T3_rep2_GW7_10000.cool --genome hg19 --enzyme uniform --output MG63_3_3T3_rep2_GW7_10000_detected_cnv.txt ARGUMENT LIST: Cool URI = MG63_3_3T3_rep2_GW7_10000.cool Reference Genome = hg19 Enzyme = uniform Output Path = MG63_3_3T3_rep2_GW7_10000_detected_cnv.txt Cache Folder = /mnt/pan/SOM_GEN_PXS183/jdc14/HICHIP/OSTEO/CAS2997_Homo_sapiens_072921/MG63_3_3T3_rep2_GW7 Log file name = cnv-calculation.log root INFO @ 11/18/21 11:31:19: hg19_mappability_50mer.bw was not found in the cache folder, downloading ... root INFO @ 11/18/21 11:32:07: hg19.uniform.npz was not found in the cache folder, downloading ... root INFO @ 11/18/21 11:32:14: hg19_1kb_GCPercent.bw was not found in the cache folder, downloading ... root INFO @ 11/18/21 11:32:28: Calculate the 1D coverage from Hi-C matrix ... root INFO @ 11/18/21 11:32:39: Load GC content ... root INFO @ 11/18/21 11:35:41: Load mappability scores ... root INFO @ 11/18/21 11:39:11: Count the number of cut sizes for each bin ... root INFO @ 11/18/21 11:41:43: Filter out invalid bins ... numexpr.utils INFO @ 11/18/21 11:41:43: Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. numexpr.utils INFO @ 11/18/21 11:41:43: NumExpr defaulting to 8 threads. root INFO @ 11/18/21 11:41:43: Done root INFO @ 11/18/21 11:41:43: Fitting a Generalized Additive Model with log link and Poisson distribution ... Traceback (most recent call last): File "/home/jdc14/miniconda3/envs/neoloop/bin/calculate-cnv", line 147, in run gam = mgcv.gam(fomula, family=stats.poisson(link='log')) File "/home/jdc14/miniconda3/envs/neoloop/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 178, in call return super(SignatureTranslatedFunction, self).call(*args, *kwargs) File "/home/jdc14/miniconda3/envs/neoloop/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 106, in call res = super(Function, self).call(new_args, **new_kwargs) rpy2.rinterface.RRuntimeError: Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom

I tried setting '--enzyme Arima' out of curiosity (which I had used in the past for some Arima data) and calculate-cnv completed with no issues.

Attached is a shot of the unnormalized hic matrix at 10kb resolution around chr8:119Mb-125Mb to get an idea of matrix sparsity.

Thanks for any help you can provide.

Screen Shot 2021-11-18 at 5 27 19 PM
XiaoTaoWang commented 2 years ago

Hi, the GAM model we are fitting in "calculate-cnv" is designed to account for biased induced by GC content, mappability, and density of restriction enzyme sites. I think the reason for this error is that pulldown-based contact maps contain additional sources of variances that cannot be explained by the model. Have you tried running "calculate-cnv" at lower resolutions (25kb/50kb/100kb)?

beoungl commented 2 years ago

I had exactly the same issue; I was using Micro-C, so also MNase digested, and ran into exact same error message at 10kb.

Processed the data at 50kb, and it seems to be working now.

DittmanC commented 2 years ago

I had exactly the same issue; I was using Micro-C, so also MNase digested, and ran into exact same error message at 10kb.

Processed the data at 50kb, and it seems to be working now.

same issue for my data at 10kb. 100kb is fine, but the assemble-complexSVs seems require higher resolution at least 25kb?

XiaoTaoWang commented 2 years ago

Hey guys, thanks for these valuable feedbacks, I will seriously consider this issue in my future versions. Since I might need more time to investigate this specific case for --enzyme uniform, I just want to remind you that you don't have to perform the CNV normalization before you run assemble-complexSVs and neoloop-caller. In theory running in --balance-type ICE mode can also give equally accurate results if your sample doesn't have severe copy number variations.