ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
62 stars 10 forks source link

Error running FitHiC step in dcHiC workflow for cis interactions: `Error in ids_sample[[i]] : subscript out of bounds` #81

Closed kalavattam closed 11 months ago

kalavattam commented 12 months ago

Description

Encountered an error when running Step #4 of the dcHiC workflow, which involves determining statistically significant pairwise interactions using FitHiC prior to running dloop. The issue arises after the FitHiC analysis completes for each sample but fails during the subsequent processing stage with an "Error in ids_sample[[i]]: subscript out of bounds" message. I believe this pertains to lines 1617–1663 of dchicf.r.

Steps to Reproduce

  1. Run the dchicf.r script with the --pcatype fithic option.
  2. Use the following command structure:
    Rscript <path_to_dcHiC>/dchicf.r \
    --file <input_file> \
    --pcatype fithic \
    --dirovwt T \
    --diffdir <diff_directory> \
    --fithicpath <path_to_FitHiC> \
    --pythonpath python

    Observe the error after the completion of FitHiC analysis for each sample.

Expected Behavior

The invocation of dchicf.r should complete the FitHiC analysis and proceed without errors.

Actual Behavior

The FitHiC analysis completes successfully for each sample, but the script halts with the following error:

Error in ids_sample[[i]] : subscript out of bounds
Calls: fithicformat
Execution halted

Environment

Additional Context

Here's an excerpt from the terminal output showing the error:

Excerpt ```txt ❯ # Step 3 ❯ Rscript "${d_dcHiC}/dchicf.r" \ > --file "${f_infile}" \ > --pcatype analyze \ > --dirovwt T \ > --diffdir "${d_diff}" Running intra sample differential calls for D14 ctrl MM samples Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr1_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr10_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr11_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr12_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr13_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr14_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr15_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr16_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr17_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr18_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr19_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr2_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr20_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr21_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr22_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr3_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr4_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr5_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr6_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr7_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr8_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chr9_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/D14_data/intra_chrX_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr1_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr10_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr11_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr12_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr13_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr14_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr15_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr16_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr17_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr18_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr19_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr2_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr20_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr21_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr22_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr3_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr4_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr5_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr6_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr7_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr8_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chr9_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/ctrl_data/intra_chrX_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr1_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr10_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr11_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr12_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr13_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr14_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr15_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr16_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr17_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr18_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr19_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr2_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr20_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr21_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr22_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr3_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr4_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr5_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr6_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr7_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr8_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chr9_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote DifferentialResult/diff_D14_ctrl_MM_40000/MM_data/intra_chrX_combined.pcOri.bedGraph and pcQnm.bedGraph files Wrote intra_sample_chr1_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr10_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr11_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr12_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr13_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr14_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr15_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr16_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr17_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr18_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr19_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr2_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr20_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr21_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr22_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr3_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr4_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr5_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr6_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr7_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr8_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chr9_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote intra_sample_chrX_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/diff_D14_ctrl_MM_40000/pcOri & DifferentialResult/diff_D14_ctrl_MM_40000/pcQnm folders Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr1_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr10_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr11_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr12_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr13_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr14_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr15_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr16_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr17_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr18_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr19_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr2_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr20_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr21_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr22_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr3_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr4_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr5_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr6_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr7_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr8_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chr9_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_chrX_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_combined.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_group.pcQnm.bedGraph file Wrote DifferentialResult/diff_D14_ctrl_MM_40000/fdr_result/differential.intra_sample_group.pcOri.bedGraph file ❯ # Step 4 ❯ Rscript "${d_dcHiC}/dchicf.r" \ > --file "${f_infile}" \ > --pcatype fithic \ > --dirovwt T \ > --diffdir "${d_diff}" \ > --fithicpath "${a_FitHiC}" \ > --pythonpath "python" Finding significant loops from intra sample D14 ctrl MM replicates Creating inputs for fithic run D14_no2_40000 Reading the bed and interaction matrix file A B C chr1 fragmentMid1 chr2 fragmentMid2 contactCount correct_A correct_B 11 4 4 1 chr1 140000 chr1 140000 1 4 4 29 5 5 1 chr1 180000 chr1 180000 1 5 5 59 7 7 2 chr1 260000 chr1 260000 2 7 7 85 8 8 2 chr1 300000 chr1 300000 2 8 8 111 15 15 1 chr1 580000 chr1 580000 1 15 15 112 15 17 1 chr1 580000 chr1 660000 1 15 17 Started calculating Marginalized Contact Count chr start end index extraField mappable mid correct_index marginalizedContactCount 1 chr1 0 40000 1 0 1 20000 1 0 2 chr1 40000 80000 2 0 1 60000 2 0 3 chr1 80000 120000 3 0 1 100000 3 0 4 chr1 120000 160000 4 0 1 140000 4 1 5 chr1 160000 200000 5 0 1 180000 5 1 6 chr1 200000 240000 6 0 1 220000 6 0 Creating inputs for fithic run D14_no7_40000 Reading the bed and interaction matrix file A B C chr1 fragmentMid1 chr2 fragmentMid2 contactCount correct_A correct_B 9 4 13 1 chr1 140000 chr1 500000 1 4 13 23 5 5 2 chr1 180000 chr1 180000 2 5 5 44 7 7 1 chr1 260000 chr1 260000 1 7 7 53 8 8 2 chr1 300000 chr1 300000 2 8 8 68 14 208 1 chr1 540000 chr1 8300000 1 14 208 71 15 16 1 chr1 580000 chr1 620000 1 15 16 Started calculating Marginalized Contact Count chr start end index extraField mappable mid correct_index marginalizedContactCount 1 chr1 0 40000 1 0 1 20000 1 0 2 chr1 40000 80000 2 0 1 60000 2 0 3 chr1 80000 120000 3 0 1 100000 3 0 4 chr1 120000 160000 4 0 1 140000 4 1 5 chr1 160000 200000 5 0 1 180000 5 2 6 chr1 200000 240000 6 0 1 220000 6 0 Creating inputs for fithic run ctrl_no2_40000 Reading the bed and interaction matrix file A B C chr1 fragmentMid1 chr2 fragmentMid2 contactCount correct_A correct_B 1 1 5 1 chr1 20000 chr1 180000 1 1 5 9 3 7 1 chr1 100000 chr1 260000 1 3 7 13 4 4 2 chr1 140000 chr1 140000 2 4 4 14 4 30 1 chr1 140000 chr1 1180000 1 4 30 30 5 5 4 chr1 180000 chr1 180000 4 5 5 47 7 7 7 chr1 260000 chr1 260000 7 7 7 Started calculating Marginalized Contact Count chr start end index extraField mappable mid correct_index marginalizedContactCount 1 chr1 0 40000 1 0 1 20000 1 1 2 chr1 40000 80000 2 0 1 60000 2 0 3 chr1 80000 120000 3 0 1 100000 3 1 4 chr1 120000 160000 4 0 1 140000 4 3 5 chr1 160000 200000 5 0 1 180000 5 5 6 chr1 200000 240000 6 0 1 220000 6 0 Creating inputs for fithic run ctrl_no7_40000 Reading the bed and interaction matrix file A B C chr1 fragmentMid1 chr2 fragmentMid2 contactCount correct_A correct_B 1 1 5 1 chr1 20000 chr1 180000 1 1 5 5 3 8 1 chr1 100000 chr1 300000 1 3 8 25 5 5 2 chr1 180000 chr1 180000 2 5 5 65 7 7 2 chr1 260000 chr1 260000 2 7 7 91 8 8 1 chr1 300000 chr1 300000 1 8 8 105 13 13 1 chr1 500000 chr1 500000 1 13 13 Started calculating Marginalized Contact Count chr start end index extraField mappable mid correct_index marginalizedContactCount 1 chr1 0 40000 1 0 1 20000 1 1 2 chr1 40000 80000 2 0 1 60000 2 0 3 chr1 80000 120000 3 0 1 100000 3 1 4 chr1 120000 160000 4 0 1 140000 4 0 5 chr1 160000 200000 5 0 1 180000 5 3 6 chr1 200000 240000 6 0 1 220000 6 0 Creating inputs for fithic run MM_no2_40000 Reading the bed and interaction matrix file A B C chr1 fragmentMid1 chr2 fragmentMid2 contactCount correct_A correct_B 2 2 2 1 chr1 60000 chr1 60000 1 2 2 3 3 3 1 chr1 100000 chr1 100000 1 3 3 4 3 7 1 chr1 100000 chr1 260000 1 3 7 8 4 4 2 chr1 140000 chr1 140000 2 4 4 13 5 5 3 chr1 180000 chr1 180000 3 5 5 34 7 7 2 chr1 260000 chr1 260000 2 7 7 Started calculating Marginalized Contact Count chr start end index extraField mappable mid correct_index marginalizedContactCount 1 chr1 0 40000 1 0 1 20000 1 0 2 chr1 40000 80000 2 0 1 60000 2 1 3 chr1 80000 120000 3 0 1 100000 3 2 4 chr1 120000 160000 4 0 1 140000 4 2 5 chr1 160000 200000 5 0 1 180000 5 3 6 chr1 200000 240000 6 0 1 220000 6 0 Creating inputs for fithic run MM_no7_40000 Reading the bed and interaction matrix file A B C chr1 fragmentMid1 chr2 fragmentMid2 contactCount correct_A correct_B 16 4 4 2 chr1 140000 chr1 140000 2 4 4 17 4 7 1 chr1 140000 chr1 260000 1 4 7 18 4 14 1 chr1 140000 chr1 540000 1 4 14 33 5 5 1 chr1 180000 chr1 180000 1 5 5 62 7 8 1 chr1 260000 chr1 300000 1 7 8 81 8 8 1 chr1 300000 chr1 300000 1 8 8 Started calculating Marginalized Contact Count chr start end index extraField mappable mid correct_index marginalizedContactCount 1 chr1 0 40000 1 0 1 20000 1 0 2 chr1 40000 80000 2 0 1 60000 2 0 3 chr1 80000 120000 3 0 1 100000 3 0 4 chr1 120000 160000 4 0 1 140000 4 4 5 chr1 160000 200000 5 0 1 180000 5 1 6 chr1 200000 240000 6 0 1 220000 6 0 Fithic requires a bias file. Please check the link for more details https://github.com/ay-lab/fithic Please generate the bias files for each sample provided in the input.txt file Create an additional folder 'biases' under current path and dump all the *.biases.gz files inside it Rerun the step again Error in FUN(X[[i]], ...) : Exit! Calls: fithicformat -> lapply -> FUN Execution halted ❯ cd "${d_base}/${d_proj}/${d_infile}/DifferentialResult/${d_diff}/fithic_run" \ > || echo "cd'ing failed; check on this" ❯ unset exps && typeset -a exps ❯ while IFS=" " read -r -d $'\0'; do > exps+=( "${REPLY}" ) > done < <(find . -type d -name "*_${res}_*" -print0 | sort -z) ❯ for exp in "${exps[@]}"; do > if [[ ! -f "${exp}/biases.txt.gz" ]]; then > python "${a_HiCKRy}" \ > -i "${exp}/interactions.txt.gz" \ > -f "${exp}/fragments.txt.gz" \ > -o "${exp}/biases.txt.gz" > echo "" > fi > done Creating sparse matrix... Sparse matrix creation took 2.4459965229034424 seconds Removing 0.05 percent of most sparse bins ... corresponds to 3789 total rows ... corresponds to all bins with less than or equal to 0.0 total interactions Sparse rows removed Initial matrix size: 75788 rows and 75788 columns New matrix size: 69830 rows and 69830 columns Normalizing with KR Algorithm WARNING... Bias vector has a median outside of typical range (0.5, 2). Consider running with a larger -x option if problems occur Mean 0.8427719427877771 Median 1.1189124367583258e-09 Std. Dev. 253.6527985474652 Creating sparse matrix... Sparse matrix creation took 2.881424903869629 seconds Removing 0.05 percent of most sparse bins ... corresponds to 3789 total rows ... corresponds to all bins with less than or equal to 0.0 total interactions Sparse rows removed Initial matrix size: 75788 rows and 75788 columns New matrix size: 69849 rows and 69849 columns Normalizing with KR Algorithm WARNING... Bias vector has a median outside of typical range (0.5, 2). Consider running with a larger -x option if problems occur Mean 0.8432733414260832 Median 6.088974504995514e-09 Std. Dev. 253.72096755584406 Creating sparse matrix... Sparse matrix creation took 2.408437490463257 seconds Removing 0.05 percent of most sparse bins ... corresponds to 3789 total rows ... corresponds to all bins with less than or equal to 0.0 total interactions Sparse rows removed Initial matrix size: 75788 rows and 75788 columns New matrix size: 69713 rows and 69713 columns Normalizing with KR Algorithm WARNING... Bias vector has a median outside of typical range (0.5, 2). Consider running with a larger -x option if problems occur Mean 0.8396843827518868 Median 2.6012449355303677e-09 Std. Dev. 222.08428426294967 Creating sparse matrix... Sparse matrix creation took 2.08026385307312 seconds Removing 0.05 percent of most sparse bins ... corresponds to 3789 total rows ... corresponds to all bins with less than or equal to 0.0 total interactions Sparse rows removed Initial matrix size: 75788 rows and 75788 columns New matrix size: 69476 rows and 69476 columns Normalizing with KR Algorithm WARNING... Bias vector has a median outside of typical range (0.5, 2). Consider running with a larger -x option if problems occur Mean 0.8334300944740592 Median 6.174390743383673e-09 Std. Dev. 252.36695742918135 Creating sparse matrix... Sparse matrix creation took 1.707571268081665 seconds Removing 0.05 percent of most sparse bins ... corresponds to 3789 total rows ... corresponds to all bins with less than or equal to 0.0 total interactions Sparse rows removed Initial matrix size: 75788 rows and 75788 columns New matrix size: 69610 rows and 69610 columns Normalizing with KR Algorithm WARNING... Bias vector has a median outside of typical range (0.5, 2). Consider running with a larger -x option if problems occur Mean 0.836966274344223 Median 5.184835662519182e-09 Std. Dev. 252.85294611927932 Creating sparse matrix... Sparse matrix creation took 2.4464430809020996 seconds Removing 0.05 percent of most sparse bins ... corresponds to 3789 total rows ... corresponds to all bins with less than or equal to 0.0 total interactions Sparse rows removed Initial matrix size: 75788 rows and 75788 columns New matrix size: 69700 rows and 69700 columns Normalizing with KR Algorithm WARNING... Bias vector has a median outside of typical range (0.5, 2). Consider running with a larger -x option if problems occur Mean 0.8393413205256771 Median 9.560161685721065e-09 Std. Dev. 253.1804869412731 ❯ if [[ ! -d "${d_base}/${d_proj}/${d_infile}/biases" ]]; then > mkdir -p "${d_base}/${d_proj}/${d_infile}/biases" > fi mkdir: created directory '/home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases' ❯ for exp in "${exps[@]}"; do > string_1="${exp#./}" # echo "${string}" > string_2="${string_1%_fithic}" # echo "${string}" > > if [[ ! -f "${d_base}/${d_proj}/${d_infile}/biases/${string_2}.biases.gz" ]]; then > cp \ > "${string_1}/biases.txt.gz" \ > "${d_base}/${d_proj}/${d_infile}/biases/${string_2}.biases.gz" > fi > done 'ctrl_no2_40000_fithic/biases.txt.gz' -> '/home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/ctrl_no2_40000.biases.gz' 'ctrl_no7_40000_fithic/biases.txt.gz' -> '/home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/ctrl_no7_40000.biases.gz' 'D14_no2_40000_fithic/biases.txt.gz' -> '/home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/D14_no2_40000.biases.gz' 'D14_no7_40000_fithic/biases.txt.gz' -> '/home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/D14_no7_40000.biases.gz' 'MM_no2_40000_fithic/biases.txt.gz' -> '/home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/MM_no2_40000.biases.gz' 'MM_no7_40000_fithic/biases.txt.gz' -> '/home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/MM_no7_40000.biases.gz' ❯ check_directory=true ❯ if ${check_directory}; then > ls -lhaFG "${d_base}/${d_proj}/${d_infile}/biases" > echo "" > fi total 7.2M drwxrws--- 2 kalavatt 246 Dec 7 10:08 ./ drwxrws--- 11 kalavatt 1.3K Dec 7 10:08 ../ -rw-rw---- 1 kalavatt 828K Dec 7 10:08 ctrl_no2_40000.biases.gz -rw-rw---- 1 kalavatt 826K Dec 7 10:08 ctrl_no7_40000.biases.gz -rw-rw---- 1 kalavatt 826K Dec 7 10:08 D14_no2_40000.biases.gz -rw-rw---- 1 kalavatt 823K Dec 7 10:08 D14_no7_40000.biases.gz -rw-rw---- 1 kalavatt 818K Dec 7 10:08 MM_no2_40000.biases.gz -rw-rw---- 1 kalavatt 835K Dec 7 10:08 MM_no7_40000.biases.gz ❯ cd "${d_base}/${d_proj}/${d_infile}" \ > || echo "cd'ing failed; check on this" ❯ Rscript "${d_dcHiC}/dchicf.r" \ > --file "${f_infile}" \ > --pcatype fithic \ > --dirovwt T \ > --diffdir "diff_D14_ctrl_MM_${res}" \ > --fithicpath "${a_FitHiC}" \ > --pythonpath "python" Finding significant loops from intra sample D14 ctrl MM replicates [1] "folder exists" Fithic file already exists for D14_no2_40000 , skipping [1] "folder exists" Fithic file already exists for D14_no7_40000 , skipping [1] "folder exists" Fithic file already exists for ctrl_no2_40000 , skipping [1] "folder exists" Fithic file already exists for ctrl_no7_40000 , skipping [1] "folder exists" Fithic file already exists for MM_no2_40000 , skipping [1] "folder exists" Fithic file already exists for MM_no7_40000 , skipping python /home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/src/fithic/fithic/fithic.py -i DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/interactions.txt.gz -f DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/fragments.txt.gz -t /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/D14_no2_40000.biases.gz -U 2000000 -o DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/fithic_result -r 40000 GIVEN FIT-HI-C ARGUMENTS ========================= Reading fragments file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/fragments.txt.gz Reading interactions file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/interactions.txt.gz Output path created DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/fithic_result Fixed size option detected... Fast version of FitHiC will be used Resolution is 40.0 kb Reading bias file from: /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/D14_no2_40000.biases.gz The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be FitHiC Upper Distance threshold is 2000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... ========================= Reading the contact counts file to generate bins... Interactions file read. Time took 5.3157219886779785 Fragments file read. Time took 0.1039571762084961 Bias file read. Time took 0.2922706604003906 Writing DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/fithic_result/FitHiC.fithic_pass1.res40000.txt Spline fit Pass 1 starting... Outlier threshold is... 1.418022154043717e-07 Writing p-values and q-values to file DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/fithic_result/FitHiC.spline_pass1.significances.txt Number of outliers is... 0 Spline fit Pass 1 completed. Time took 21.363317251205444 ========================= Fit-Hi-C completed successfully python /home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/src/fithic/fithic/fithic.py -i DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/interactions.txt.gz -f DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/fragments.txt.gz -t /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/D14_no7_40000.biases.gz -U 2000000 -o DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/fithic_result -r 40000 GIVEN FIT-HI-C ARGUMENTS ========================= Reading fragments file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/fragments.txt.gz Reading interactions file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/interactions.txt.gz Output path created DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/fithic_result Fixed size option detected... Fast version of FitHiC will be used Resolution is 40.0 kb Reading bias file from: /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/D14_no7_40000.biases.gz The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be FitHiC Upper Distance threshold is 2000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... ========================= Reading the contact counts file to generate bins... Interactions file read. Time took 4.570132255554199 Fragments file read. Time took 0.12056660652160645 Bias file read. Time took 0.28597235679626465 Writing DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/fithic_result/FitHiC.fithic_pass1.res40000.txt Spline fit Pass 1 starting... Outlier threshold is... 1.4228997501672618e-07 Writing p-values and q-values to file DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/fithic_result/FitHiC.spline_pass1.significances.txt Number of outliers is... 0 Spline fit Pass 1 completed. Time took 17.940237760543823 ========================= Fit-Hi-C completed successfully python /home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/src/fithic/fithic/fithic.py -i DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/interactions.txt.gz -f DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/fragments.txt.gz -t /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/ctrl_no2_40000.biases.gz -U 2000000 -o DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/fithic_result -r 40000 GIVEN FIT-HI-C ARGUMENTS ========================= Reading fragments file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/fragments.txt.gz Reading interactions file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/interactions.txt.gz Output path created DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/fithic_result Fixed size option detected... Fast version of FitHiC will be used Resolution is 40.0 kb Reading bias file from: /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/ctrl_no2_40000.biases.gz The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be FitHiC Upper Distance threshold is 2000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... ========================= Reading the contact counts file to generate bins... Interactions file read. Time took 5.569983005523682 Fragments file read. Time took 0.11757349967956543 Bias file read. Time took 0.29406166076660156 Writing DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/fithic_result/FitHiC.fithic_pass1.res40000.txt Spline fit Pass 1 starting... Outlier threshold is... 1.415626535070024e-07 Writing p-values and q-values to file DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no2_40000_fithic/fithic_result/FitHiC.spline_pass1.significances.txt Number of outliers is... 0 Spline fit Pass 1 completed. Time took 22.050061464309692 ========================= Fit-Hi-C completed successfully python /home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/src/fithic/fithic/fithic.py -i DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/interactions.txt.gz -f DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/fragments.txt.gz -t /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/ctrl_no7_40000.biases.gz -U 2000000 -o DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/fithic_result -r 40000 GIVEN FIT-HI-C ARGUMENTS ========================= Reading fragments file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/fragments.txt.gz Reading interactions file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/interactions.txt.gz Output path created DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/fithic_result Fixed size option detected... Fast version of FitHiC will be used Resolution is 40.0 kb Reading bias file from: /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/ctrl_no7_40000.biases.gz The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be FitHiC Upper Distance threshold is 2000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... ========================= Reading the contact counts file to generate bins... Interactions file read. Time took 6.58297324180603 Fragments file read. Time took 0.10645461082458496 Bias file read. Time took 0.29202890396118164 Writing DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/fithic_result/FitHiC.fithic_pass1.res40000.txt Spline fit Pass 1 starting... Outlier threshold is... 1.4152382666841024e-07 Writing p-values and q-values to file DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/ctrl_no7_40000_fithic/fithic_result/FitHiC.spline_pass1.significances.txt Number of outliers is... 0 Spline fit Pass 1 completed. Time took 25.76362919807434 ========================= Fit-Hi-C completed successfully python /home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/src/fithic/fithic/fithic.py -i DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/interactions.txt.gz -f DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/fragments.txt.gz -t /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/MM_no2_40000.biases.gz -U 2000000 -o DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/fithic_result -r 40000 GIVEN FIT-HI-C ARGUMENTS ========================= Reading fragments file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/fragments.txt.gz Reading interactions file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/interactions.txt.gz Output path created DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/fithic_result Fixed size option detected... Fast version of FitHiC will be used Resolution is 40.0 kb Reading bias file from: /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/MM_no2_40000.biases.gz The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be FitHiC Upper Distance threshold is 2000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... ========================= Reading the contact counts file to generate bins... Interactions file read. Time took 3.753464698791504 Fragments file read. Time took 0.10889172554016113 Bias file read. Time took 0.2916555404663086 Writing DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/fithic_result/FitHiC.fithic_pass1.res40000.txt Spline fit Pass 1 starting... Outlier threshold is... 1.4201378385786125e-07 Writing p-values and q-values to file DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no2_40000_fithic/fithic_result/FitHiC.spline_pass1.significances.txt Number of outliers is... 0 Spline fit Pass 1 completed. Time took 14.641690492630005 ========================= Fit-Hi-C completed successfully python /home/kalavatt/tsukiyamalab/kalavatt/2023_rDNA/src/fithic/fithic/fithic.py -i DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/interactions.txt.gz -f DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/fragments.txt.gz -t /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/MM_no7_40000.biases.gz -U 2000000 -o DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/fithic_result -r 40000 GIVEN FIT-HI-C ARGUMENTS ========================= Reading fragments file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/fragments.txt.gz Reading interactions file from: DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/interactions.txt.gz Output path created DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/fithic_result Fixed size option detected... Fast version of FitHiC will be used Resolution is 40.0 kb Reading bias file from: /fh/fast/tsukiyama_t/grp/tsukiyamalab/kalavatt/2023_rDNA/results/2023-1018_work_Hi-C_align-process/09_dcHiC/biases/MM_no7_40000.biases.gz The number of spline passes is 1 The number of bins is 100 The number of reads required to consider an interaction is 1 The name of the library for outputted files will be FitHiC Upper Distance threshold is 2000000 Lower Distance threshold is 0 Only intra-chromosomal regions will be analyzed Lower bound of bias values is 0.5 Upper bound of bias values is 2 All arguments processed. Running FitHiC now... ========================= Reading the contact counts file to generate bins... Interactions file read. Time took 5.746699333190918 Fragments file read. Time took 0.10715699195861816 Bias file read. Time took 0.2898731231689453 Writing DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/fithic_result/FitHiC.fithic_pass1.res40000.txt Spline fit Pass 1 starting... Outlier threshold is... 1.4182888345211502e-07 Writing p-values and q-values to file DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/MM_no7_40000_fithic/fithic_result/FitHiC.spline_pass1.significances.txt Number of outliers is... 0 Spline fit Pass 1 completed. Time took 22.723848819732666 ========================= Fit-Hi-C completed successfully [1] 1 Taking input= as a system command ('gzip -dc DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no2_40000_fithic/fithic_result/FitHiC.spline_pass1.res40000.significances.txt.gz') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message. Taking input= as a system command ('gzip -dc DifferentialResult/diff_D14_ctrl_MM_40000/fithic_run/D14_no7_40000_fithic/fithic_result/FitHiC.spline_pass1.res40000.significances.txt.gz') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message. Error in ids_sample[[i]] : subscript out of bounds Calls: fithicformat Execution halted ```


The issue seems to occur after FitHiC successfully completes its analysis for each sample but fails during the subsequent processing stage. It appears to be related to the handling of sample IDs or indices in the script.

Any insights or guidance on how to resolve this issue would be greatly appreciated.

ay-lab commented 12 months ago

Hi,

Generally an "ids_sample[[i]] out of bound" error will mean that there is a mismatch in the name of the files provided in the index file and the samples with actual fithic run. Let me try to debug this issue. Can you please share the input file description and the results generated under "DifferentialResult/folderName/fithic_run" directory. For example, this is a demo index file for me looks like -

ES_1_100Kb.matrix        ES_1_100Kb_abs.bed       ES_1_100Kb      ES
ES_2_100Kb.matrix        ES_2_100Kb_abs.bed       ES_2_100Kb      ES
ES_3_100Kb.matrix        ES_3_100Kb_abs.bed       ES_3_100Kb      ES
ES_4_100Kb.matrix        ES_4_100Kb_abs.bed       ES_4_100Kb      ES
NPC_2_100Kb.matrix     NPC_2_100Kb_abs.bed       NPC_2_100Kb     NPC
NPC_3_100Kb.matrix     NPC_3_100Kb_abs.bed       NPC_3_100Kb     NPC
NPC_4_100Kb.matrix     NPC_4_100Kb_abs.bed       NPC_4_100Kb     NPC

And my fithic_run directory looks like this - image

For each replicate dchic will run the fithic calls and finally it will combine the replicate wise significant result to generate the FithicResult.txt file. This fill will look like this -

ES      NPC
chr10_100000000_chr10_100000000 1       1
chr10_100000000_chr10_100100000 1       1
chr10_100000000_chr10_100400000 1       1
chr10_100000000_chr10_102200000 0       1
chr10_100000000_chr10_102300000 1       1
chr10_100000000_chr10_102400000 0       1
chr10_100000000_chr10_102500000 1       1

While merging the replicates for a sample, it will use the names from the index.file. So, if there is a mismatch, then it can throw up an error.

kalavattam commented 12 months ago

Thank you for your assistance.

Contents of sample infile:

HC-mat_D14_no2_40000.matrix HC-mat_D14_no2_40000_abs.bed    D14_no2_40000   D14
HC-mat_D14_no7_40000.matrix HC-mat_D14_no7_40000_abs.bed    D14_no7_40000   D14
HC-mat_Tx-ctrl_no2_40000.matrix HC-mat_Tx-ctrl_no2_40000_abs.bed    ctrl_no2_40000  ctrl
HC-mat_Tx-ctrl_no7_40000.matrix HC-mat_Tx-ctrl_no7_40000_abs.bed    ctrl_no7_40000  ctrl
HC-mat_Tx-MM_no2_40000.matrix   HC-mat_Tx-MM_no2_40000_abs.bed  MM_no2_40000    MM
HC-mat_Tx-MM_no7_40000.matrix   HC-mat_Tx-MM_no7_40000_abs.bed  MM_no7_40000    MM

Contents of DifferentialResult/folderName/fithic_run directory:

drwxrws--- 3 kalavatt 133 Dec  7 10:14 ctrl_no2_40000_fithic/
drwxrws--- 3 kalavatt 133 Dec  7 10:15 ctrl_no7_40000_fithic/
drwxrws--- 3 kalavatt 133 Dec  7 10:13 D14_no2_40000_fithic/
drwxrws--- 3 kalavatt 133 Dec  7 10:14 D14_no7_40000_fithic/
drwxrws--- 3 kalavatt 133 Dec  7 10:15 MM_no2_40000_fithic/
drwxrws--- 3 kalavatt 133 Dec  7 10:15 MM_no7_40000_fithic/
kalavattam commented 12 months ago

OK, I believe I have identified the source of the issue: There are no significant interactions in the given FitHiC.spline_pass1.*.significances.txt.gz file. Please see the following:

> data.table::fread(
+   cmd = paste0(
+     "gzip -dc ",diffdir,"/fithic_run/",data_rep$prefix[j],
+     "_fithic/fithic_result/FitHiC.spline_pass1.res",
+     as.integer(resolution),".significances.txt.gz"),
+   h=T
+ )
         chr1 fragmentMid1 chr2 fragmentMid2 contactCount p-value q-value bias1 bias2 ExpCC
      1: chr1       140000 chr1       140000            1       1       1    -1    -1     0
      2: chr1       180000 chr1       180000            1       1       1    -1    -1     0
      3: chr1       260000 chr1       260000            2       1       1    -1    -1     0
      4: chr1       300000 chr1       300000            2       1       1    -1    -1     0
      5: chr1       580000 chr1       580000            1       1       1    -1    -1     0
     ---                                                                                   
1418917: chrX    155620000 chrX    155660000            7       1       1    -1    -1     0
1418918: chrX    155620000 chrX    155700000            2       1       1    -1    -1     0
1418919: chrX    155660000 chrX    155660000           40       1       1    -1    -1     0
1418920: chrX    155660000 chrX    155700000            5       1       1    -1    -1     0
1418921: chrX    155700000 chrX    155700000           16       1       1    -1    -1     0

This is likely a consequence of working with sparse Hi-C matrices at a resolution that may be too fine for FitHiC2 analyses, resulting in no significant interactions being detected.

Stepping through and evaluating the subsequent lines of function fithicformat in dchicf.r, we see the following:

> j <- 1  # For debugging, assign variable j to 1
> mat_rep[[j]] <- data.table::fread(
+   cmd = paste0(
+     "gzip -dc ",diffdir,"/fithic_run/",data_rep$prefix[j],
+     "_fithic/fithic_result/FitHiC.spline_pass1.res",
+     as.integer(resolution),".significances.txt.gz"),
+   h=T
+ )
> colnames(mat_rep[[j]])[c(6,7)] <- c("pval","qval")
> mat_rep[[j]] <- mat_rep[[j]][mat_rep[[j]]$qval < fdr_thr,]
> mat_rep[[j]]
Empty data.table (0 rows and 10 cols): chr1,fragmentMid1,chr2,fragmentMid2,contactCount,pval...

I believe this—"Empty data.table (0 rows and 10 cols):"—is where the issue begins to be compounded. An empty data.table object leads to the error because subsequent operations expect non-empty data.

By stepping through L1630–L1651 with a no-row mat_rep[[j]] data.table object, we can reproduce the error:

> mat_rep[[j]][,"id"]  <- paste0(
+   mat_rep[[j]]$chr1,"_",
+   as.integer(mat_rep[[j]]$fragmentMid1-(resolution/2)),"_",
+   mat_rep[[j]]$chr2,"_",
+   as.integer(mat_rep[[j]]$fragmentMid2-(resolution/2))
+ )
> mat_rep[[j]][,"id"]
Empty data.table (0 rows and 1 cols): id

> mat_rep[[j]][,"sig"] <- 1
> data.table::setkey(mat_rep[[j]],id)
> ids_rep[[j]] <- as.character(unlist(mat_rep[[j]][,11]))
> ids_rep[[j]]
character(0)

> ids_rep <- sort(unique(as.character(unlist(ids_rep))))
> mat_sample[[i]] <- matrix(0, length(ids_rep), nrow(data_rep))
> rownames(mat_sample[[i]]) <- ids_rep
> colnames(mat_sample[[i]]) <- as.character(data_rep$prefix)

> mat_sample[[i]]
[,1] [,2]
> rownames(mat_sample[[i]])
NULL
> colnames(mat_sample[[i]])
[1] "D14_no2_40000" "D14_no7_40000"
> mat_sample[[i]]
D14_no2_40000 D14_no7_40000

> j <- 1  # For debugging, assign variable j to 1
> mat_sample[[i]][,j] <- as.integer(unlist(mat_rep[[j]][.(ids_rep)][,12]))
> mat_sample[[i]][,j]
numeric(0)

> freq_thr <- ifelse(nrow(data_rep) == 1, 0, 0.5)
> mat_sample[[i]] <- mat_sample[[i]][
+   which((apply(mat_sample[[i]], 1, sum)/nrow(data_rep)) >= freq_thr),
+ ]
> if (nrow(data_rep) > 1) {
+   ids_sample[[i]] <- rownames(mat_sample[[i]])
+ } else {
+   ids_sample[[i]] <- names(mat_sample[[i]])
+ }
> mat_sample[[i]] <- data.table::as.data.table(mat_sample[[i]])
> mat_sample[[i]][,"id"] <- ids_sample[[i]]
Error in ids_sample[[i]] : subscript out of bounds

> ids_sample[[i]]
Error in ids_sample[[i]] : subscript out of bounds

It might be prudent to include a check within the for loop at L1625–L1633, perhaps after mat_rep[[j]] <- mat_rep[[j]][mat_rep[[j]]$qval < fdr_thr,] (L1628):

for(j in 1:nrow(data_rep)) {
    mat_rep[[j]] <- data.table::fread(paste0("gzip -dc ",diffdir,"/fithic_run/",data_rep$prefix[j],"_fithic/fithic_result/FitHiC.spline_pass1.res",as.integer(resolution),".significances.txt.gz"), h=T)
    colnames(mat_rep[[j]])[c(6,7)] <- c("pval","qval")
    mat_rep[[j]] <- mat_rep[[j]][mat_rep[[j]]$qval < fdr_thr,]
    if (nrow(mat_rep[[j]]) == 0) {  # Perhaps here?
        stop("No significant interactions were found; exiting now.")
    }  
    mat_rep[[j]][,"id"]  <- paste0(mat_rep[[j]]$chr1,"_",as.integer(mat_rep[[j]]$fragmentMid1-(resolution/2)),"_",mat_rep[[j]]$chr2,"_",as.integer(mat_rep[[j]]$fragmentMid2-(resolution/2)))
    mat_rep[[j]][,"sig"] <- 1
    data.table::setkey(mat_rep[[j]],id)
    ids_rep[[j]] <- as.character(unlist(mat_rep[[j]][,11]))
}

This check could help prevent the propagation of an empty data structure through the rest of the function.

ay-lab commented 12 months ago

The issue is likely because your bias values are all -1. FitHiC ignores any pairs with a locus with very high or very low bias values (which you can set) but certainly those with -1 will never get assigned a p-value, "1" is a placeholder there rather than not significant. We could have put N/A, I guess but that would lead to problems downstream. Please check your normalization first.

kalavattam commented 11 months ago

Thanks for the response. I agree that the "-1" value is interpretable as it is, so there's no need for any changes in that regard.

I'm working with relatively shallow-sequenced Hi-C data. I managed to resolve the bias issue by setting the HiCKRy.py -x argument to a value higher than 0.05. This adjustment effectively addressed the problem.

Considering this, the issue can likely be closed. However, to enhance user experience, it might be beneficial to add a small check within the loop at L1625–L1633 of dchicf.r, possibly after mat_rep[[j]] <- mat_rep[[j]][mat_rep[[j]]$qval < fdr_thr,] on L1628. Such a check could provide more informative error messaging for similar situations in the future, although it's not essential.

ay-lab commented 11 months ago

Thanks a lot for the suggestion! I have added a check.