ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
60 stars 10 forks source link

error: arguments imply differing number of rows #83

Closed katecycho closed 9 months ago

katecycho commented 10 months ago

Hello,

I keep getting error at the first step running on cis. I have converted three .hic file to 50kb and 100kb and added blacklist. I removed chrM in addition to Y after the first time error came up, which didn't resolve it. I also tried 100kb resolution but the error is same. From the output intra_pca folder, it seems that my first file is the issue. When I run at 50kb, it stops generating files at chr14 (only chr14.precmat.txt), but with 100kb at chr11 (only chr11.precmat.txt). I get the same error when I run it on the next two files. Do you have an y idea what the issue is? Thank you in advance.

Code: Rscript dchicf.r --file input_matched.txt --pcatype cis --dirovwt T --cthread 2 --pthread 4 Error:

Performing block wise correlation calculation    : complete!
Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: arguments imply differing number of rows: 1295, 1297
Calls: lapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Execution halted
katecycho commented 10 months ago

Actually I do not get this error when I do not remove the blacklist but encounter a new error...

Error in checkForRemoteErrors(val) : 
  one node produced an error: upper value must be greater than lower value
Calls: lapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Execution halted
ay-lab commented 10 months ago

It seems like a formatting issue. Try using https://github.com/XiaoTaoWang/HiCLift to convert the .hic file to hicpro format. This may solve the issue. Otherwise, I will try to debug more. Thanks.

katecycho commented 9 months ago

Thank you! How can I generate the bias file from .hic to hicpro conversion? For .mcool files, I believe the bias is same as the weight - is that correct? Thank you

ay-lab commented 9 months ago

Seems like it, you can try it out. Here is the section fro Fithic that describes how to generate the bias file https://github.com/ay-lab/fithic?tab=readme-ov-file#hickry

katecycho commented 9 months ago

It seems like a formatting issue. Try using https://github.com/XiaoTaoWang/HiCLift to convert the .hic file to hicpro format. This may solve the issue. Otherwise, I will try to debug more. Thanks.

HiCLift doesn't convert .hic to hicpro. HiCExplorer has that but it seems to have some bugs. do you have any other suggestions? Thank you!

Seems like it, you can try it out

For .mcool files, the weight is multiplicative and they do not average around 1 in the bias files I generate with cooler dump....

ay-lab commented 9 months ago

Use HiCLift first to convert the .hic file into pairs format -

HiCLiift --input <hicfile> --input-format hic --output-format pairs --out-pre <prefix>

Convert the pairs into HiC-Pro valid pair format -

grep -v "^#" <pairsfile> |awk -v OFS='\t' '{print $1,$2,$3,$6,$4,$5,$7}' > <validpairfile>

Then download this code from HiC-pro repo build_matrix.cpp and compile it. Let's say after compiling, you got the build_matrix executable, then use the following command to convert the valid pair file into HiC-pro format -

./buildmatrix --binsize <resolution> --chrsizes <chrom_hg38.sizes> --ifile <validPairs> --oprefix <prefix> --matrix-format upper

Using the hic pro format, you can follow this link https://github.com/ay-lab/fithic?tab=readme-ov-file#hickry to calculate the bias file.

I hope these steps will resolve the issue.

Let me know if you face issues.

katecycho commented 9 months ago

Thank you soo much for the help. I was able to convert to matrix and bed files and successfully ran --pcatype cis and --pcatype select. However, I encounter this error with --pcatype analyze. This occurs after pcOri, pcQnm and individual folder for the samples have been generated in the indicated --diffdir. Do you know what the issue may be? I tried to update the data.table package but that is not the issue. I think same issue was observed in https://github.com/ay-lab/dcHiC/issues/36 I apologize to keep bothering you with new issues and questions every time, but really appreciate the help!

Error in[.data.frame(df_intra, , data_rep$prefix) : undefined columns selected Calls: pcanalyze -> mean -> [ -> [.data.frame Execution halted

ay-lab commented 9 months ago

Can you please share the input.txt file information here?

Seems like, it is trying to look for a column name from data_rep$prefix but can't find one of them.

Can you also share the contents of the pcOri folder? It will help me to debug this further!

katecycho commented 9 months ago

Can you please share the input.txt file information here?

Seems like, it is trying to look for a column name from data_rep$prefix but can't find one of them.

Can you also share the contents of the pcOri folder? It will help me to debug this further!

Yes. Could I email this to you instead?

ay-lab commented 9 months ago

Sure, abhijit@lji.org

ay-lab commented 9 months ago

Thanks for sending over the files, and I am glad you figured out the issue.

So that everyone knows, the issue was due to file names in the input.txt (3rd and 4th column) starting with numbers. Changing the names starting with numbers to characters should resolve the issue.