broadinstitute / ABC-Enhancer-Gene-Prediction

Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)
MIT License
203 stars 62 forks source link

Can't compute powerlaw fit on VC normalized HiC files #54

Closed lindaboshans closed 11 months ago

lindaboshans commented 3 years ago

Hello,

I have been trying to use my own HiC data for the ABC model, and due to the original .hic files containing different resolutions, I am using ".VCnorm" and ."VCobserved" files I generated using juicebox_dump.py. I am now trying to computer the powerlaw using computer_powerlaw_fit_from_hic.py, but get the following error:

Using: /sc/arion/projects/YangLab_NGS/data/hic_for_ABC/AD_VC/chr22/chr22.VCobserved.gz Working on /sc/arion/projects/YangLab_NGS/data/hic_for_ABC/AD_VC/chr22/chr22.VCobserved.gz Loading HiC hic.to.sparse: Elapsed time: 0.23026204109191895

Traceback (most recent call last): File "compute_powerlaw_fit_from_hic.py", line 70, in load_hic_for_powerlaw interpolate_nan=False) File "/sc/arion/projects/YangLab_NGS/ABC-Enhancer-Gene-Prediction/src/hic.py", line 52, in load_hic apply_diagonal_bin_correction = apply_diagonal_bin_correction) File "/sc/arion/projects/YangLab_NGS/ABC-Enhancer-Gene-Prediction/src/hic.py", line 80, in process_hic assert(np.max(sums[sums > 0])/np.min(sums[sums > 0]) < 1.001) AssertionError

Is there something wrong with my VC normalized files? Really appreciate any help/advice, as I have been stuck on this problem for days. Thank you.

thouis commented 3 years ago

Hi Linda,

Could you paste the command arguments you're using for computer_powerlaw_fit_from_hic.py. We might also want to grab a copy of your VC files for debugging.

@jnasser3 - can you comment? I'm trying to trace through the logic around checking this ratio, as well as the allow_vc flag (should it be false in this case?).

jnasser3 commented 3 years ago

This ratio was intended to only be checked for KR normalized matrices. I don't think VC normalized matrices are expected to pass this assertion. If you want to use VC data, allow_vc should be set to true.

The code shouldn't be getting to this assertion for VC data. First thing I can think of is to make sure the VC file is not empty.

lindaboshans commented 3 years ago

Hi thouis and jnasser3,

Thank you for taking the time to help me with this!

This is the command arguments I am using in my script. python compute_powerlaw_fit_from_hic.py --hicDir $hic/Ngn2_VC/ --outDir $hic/Ngn2_VC/powerlaw/ --maxWindow 1000000 --minWindow 5000 --resolution 5000

I've also attached my chr22 VCnorm and VCobserved files. I checked my VC files and they are not empty.

In terms of setting allow_vc to true, is line 59 of compute_powerlaw_fit_from_hic.py what you are referring to? "if args.hic_type == 'juicebox': hic_file, hic_norm_file, hic_is_vc = get_hic_file(chrom, args.hicDir, allow_vc=False)"

I've tried both with it set to false and true and I keep getting the same assertion error.

chr22.VCnorm.gz chr22.VCobserved.gz

weiwsmiling commented 3 years ago

Dear all,

I used KR matrices and got the same error when the predict.py processed chrM:

Making predictions for chromosome: chrM Making putative predictions table... Using: HiC/chrM/chrM.KRobserved.gz Begin HiC Loading HiC hic.to.sparse: Elapsed time: 0.006700038909912109 Traceback (most recent call last): File "ABC-Enhancer-Gene-Prediction-0.2/src/predict.py", line 141, in main() File "ABC-Enhancer-Gene-Prediction-0.2/src/predict.py", line 103, in main this_chr = make_predictions(chromosome, this_enh, this_genes, args) File "/gpfs/gsfs11/users/wuw11/Jacob/Neuron_EdU/ABCmodel/ABC-Enhancer-Gene-Prediction-0.2/src/predictor.py", line 14, in make_predictions pred = add_hic_to_enh_gene_table(enhancers, genes, pred, hic_file, hic_norm_file, hic_is_vc, chromosome, args) File "/gpfs/gsfs11/users/wuw11/Jacob/Neuron_EdU/ABCmodel/ABC-Enhancer-Gene-Prediction-0.2/src/predictor.py", line 63, in add_hic_to_enh_gene_table gamma = args.hic_gamma) File "/gpfs/gsfs11/users/wuw11/Jacob/Neuron_EdU/ABCmodel/ABC-Enhancer-Gene-Prediction-0.2/src/hic.py", line 47, in load_hic apply_diagonal_bin_correction = apply_diagonal_bin_correction) File "/gpfs/gsfs11/users/wuw11/Jacob/Neuron_EdU/ABCmodel/ABC-Enhancer-Gene-Prediction-0.2/src/hic.py", line 75, in process_hic assert(np.max(sums[sums > 0])/np.min(sums[sums > 0]) < 1.001) AssertionError

Wondering how should I fix it.

Thanks in advance!

thouis commented 3 years ago

@lindaseong - I think your data may work if you change line 59 of compute_powerlaw_fit_from_hic.py to hic_file, hic_norm_file, hic_is_vc = get_hic_file(chrom, args.hicDir, allow_vc=True)

@weiwsmiling - can you post your HiC/chrM/chrM.* files?

weiwsmiling commented 3 years ago

Thanks Thouis for your response. I removed the enhancers and promoter on chrM from the list and now it works well.