ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
57 stars 10 forks source link

How to run dcHiC without replicates #36

Closed BenxiaHu closed 1 year ago

BenxiaHu commented 2 years ago

Hello, thanks for developing dcHiC analyzing Hi-C without replicates. Running dcHiC Without Replicates Differential calling with dcHiC "learns" the amount that PC (compartment) values vary between biological replicate datasets and uses those parameters for significance thresholds. However, it is also possible to run dcHiC from start to finish without replicates (for users using HiC-Pro, this means using the allValidPairs file).

In the input.txt file, put the same name for the "replicate" and "cell line" columns.

HMEC HMEC /path/to/HMEC MCF7 MCF7 /path/to/MCF7 MCF10 MCF10 /path/to/MCF10

After building input.txt file, I am trying to running dchic.py, but got the following errors:

Traceback (most recent call last): File "/data/software/dcHiC-dcHiCv2.0/dchic/run.py", line 421, in filenum = int(file.split("_")[3].split(".")[0]) ValueError: invalid literal for int() with base 10: 'exp' python /data/software/dcHiC-dcHiCv2.0/dchic/makeBedGraph.py -eigfile chr_1/hmfa_HiC3-CD8T-Health2-Veh_combine_exp_1.txt -chr 1 -exp HiC3-CD8T-Health2-Veh_combine Traceback (most recent call last): File "/data/software/dcHiC-dcHiCv2.0/dchic/makeBedGraph.py", line 40, in with open(results.eigfile, "r") as file: FileNotFoundError: [Errno 2] No such file or directory: 'chr_1/hmfa_HiC3-CD8T-Health2-Veh_combine_exp_1.txt' python /data/software/dcHiC-dcHiCv2.0/dchic/makeBedGraph.py -eigfile chr_1/hmfa_HiC4-CD8T-Health2-GYY_combine_exp_2.txt -chr 1 -exp HiC4-CD8T-Health2-GYY_combine Traceback (most recent call last): File "/data/software/dcHiC-dcHiCv2.0/dchic/makeBedGraph.py", line 40, in with open(results.eigfile, "r") as file: FileNotFoundError: [Errno 2] No such file or directory: 'chr_1/hmfa_HiC4-CD8T-Health2-GYY_combine_exp_2.txt'

ay-lab commented 2 years ago

Hi, thanks for using the dcHiC and yes, it is possible to run dcHiC without the replicates. The input.txt file should have four columns -

matrix.file bed.file replicate_name sample_name

For your case, this should be something like the following -

HMEC.mat HMEC.bed HEMC HMEC_sample MCF7.mat MCF7.bed MCF7 MCF7_sample MCF10.mat MCF10.bed MCF10 MCF10_sample

Even if you don't have replicates, treat the sample as replicate (third column), and please make sure that the third and fourth column name is different (that is why I have added the "_sample" to the HiC names).

Also, we had a major update to our code. So, please use the current version. Disregard it if you're using the latest version. Give it a try and let us know how it goes. We are happy to help!

BenxiaHu commented 2 years ago

Thanks a lot.

when I trying to run this command: Rscript dchicf.r--file ${inputfile1} --pcatype analyze --dirovwt T --diffdir Health_vs_case_100Kb --genome hg38 this error was occurred to me: Wrote intra_sample_chr9_combined.pcOri.bedGraph & _combined.pcQnm.bedGraph files under DifferentialResult/Health_vs_case_100Kb/pcOri & DifferentialResult/Health_vs_case_100Kb/pcQnm folders Error in [.data.frame(df_intra, , data_rep$prefix) : undefined columns selected Calls: pcanalyze -> mean -> [ -> [.data.frame Execution halted

it is much better to provide a function in dcHiC to remove non-standard chromosomes. do you know how to solve this issue? best,

ay-lab commented 2 years ago

Do you have chrY or chrM in the bed files? Also, can you paste the input.txt file here and the column names of the intra_sample_chr9_combined.pcOri.bedGraph? I think the error is due to a naming mismatch!

BenxiaHu commented 2 years ago

NO, I have deleted chrX/Y/M in the bed files in advances. But I do not delete chrX/Y/M from the matrix files.

input.txt file has four columns, Health2_case_combine.matrix Health2_case_combine.bed Health2_case_combine Health2_case_combine_sample Health2_ctrl_combine.matrix Health2_ctrl_combine.bed Health2_ctrl_combine Health2_ctrl_combine_sample

bedgraph: head differential.intrasamplechrX_combined.pcQnm.bedGraph chr start end Health2_case_combine Health2_ctrl_combine Health2_case_combine_sample Health2_ctrl_combine_sample sample_maha pval

chrX 2700000 2800000 -0.20508 -0.11988 -0.20508 -0.11988 3.53271480798589e-05 0.995257669756777

not sure why chrX are still detected.

ay-lab commented 2 years ago

Hi, thanks for pasting the files. Deleting the chromosomes from the bed file is good enough. dcHiC can handle chrX, so there is no need to delete the chrX. ChrY/M has too few reads and thus throws out an error, so it's a better idea to exclude them.

I would suggest deleting the existing files/folders generated by dcHiC completely and trying running it again. To save time and resources, dcHiC can detect existing files, and my guess is that it is doing the same here too.

BenxiaHu commented 2 years ago

I rerun it again, but the error is occured to again. I do not know which step is not correct.

ay-lab commented 2 years ago

I need to understand the data before I can help you further! If you look at the 'DifferentialResult/Health_vs_case_100Kb' folder you should see the following files -

$ tree ./DifferentialResult/Health_vs_case_100Kb |-- differential.intra_sample_chr10_combined.pcQnm.bedGraph |-- differential.intra_sample_chr11_combined.pcQnm.bedGraph |-- differential.intra_sample_chr12_combined.pcQnm.bedGraph |-- differential.intra_sample_chr13_combined.pcQnm.bedGraph |-- differential.intra_sample_chr14_combined.pcQnm.bedGraph |-- differential.intra_sample_chr15_combined.pcQnm.bedGraph |-- differential.intra_sample_chr16_combined.pcQnm.bedGraph |-- differential.intra_sample_chr17_combined.pcQnm.bedGraph |-- differential.intra_sample_chr18_combined.pcQnm.bedGraph |-- differential.intra_sample_chr19_combined.pcQnm.bedGraph |-- differential.intra_sample_chr20_combined.pcQnm.bedGraph |-- differential.intra_sample_chr21_combined.pcQnm.bedGraph |-- differential.intra_sample_chr22_combined.pcQnm.bedGraph |-- differential.intra_sample_chr1_combined.pcQnm.bedGraph |-- differential.intra_sample_chr2_combined.pcQnm.bedGraph |-- differential.intra_sample_chr3_combined.pcQnm.bedGraph |-- differential.intra_sample_chr4_combined.pcQnm.bedGraph |-- differential.intra_sample_chr5_combined.pcQnm.bedGraph |-- differential.intra_sample_chr6_combined.pcQnm.bedGraph |-- differential.intra_sample_chr7_combined.pcQnm.bedGraph |-- differential.intra_sample_chr8_combined.pcQnm.bedGraph |-- differential.intra_sample_chr9_combined.pcQnm.bedGraph |-- differential.intra_sample_chrX_combined.pcQnm.bedGraph |-- differential.intra_sample_combined.Filtered.pcQnm.bedGraph |-- differential.intra_sample_combined.pcQnm.bedGraph |-- differential.intra_sample_group.Filtered.pcOri.bedGraph |-- differential.intra_sample_group.Filtered.pcQnm.bedGraph |-- differential.intra_sample_group.pcOri.bedGraph `-- differential.intra_sample_group.pcQnm.bedGraph

Can you check your folder and let me know which files are missing? This will help me to point out the step where the program is failing!

katecycho commented 8 months ago

I rerun it again, but the error is occured to again. I do not know which step is not correct.

Hi Benxia, Do you by any chance remember how you fixed this issue? I am getting the same error. Thank you