ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
57 stars 10 forks source link

error in hclust (--pcatype select) #47

Closed Nico-FR closed 1 year ago

Nico-FR commented 1 year ago

Hello,

The first step Rscript dchicf.r --file input_files.txt --pcatype cis is working very well for the compartment calling.

But I am getting an error for the second step Rscript dchicf.r --file input_files.txt --pcatype select:

Error in hclust(as.dist(round(1 - cor(pc.mat), 4))) : 
  NA/NaN/Inf in foreign function call (arg 10)
Calls: pcselect -> pcselectioncore -> hclust
Execution halted

It seems that it is working for the first two samples as the stdout return:

Running  intra   1  in  poll_0197  sample
Running  intra   1  in  poll_3654  sample

Here is my input files:

Bovin-0197.ARS-UCD1.2.mapq_10.50000.txt   Bovin-0197.ARS-UCD1.2.mapq_10.50000.bed   poll_0197       poll
Bovin-3654.ARS-UCD1.2.mapq_10.50000.txt   Bovin-3654.ARS-UCD1.2.mapq_10.50000.bed   poll_3654       poll
Bovin-669.ARS-UCD1.2.mapq_10.50000.txt    Bovin-669.ARS-UCD1.2.mapq_10.50000.bed    unp_669 unp
Bovin-977.ARS-UCD1.2.mapq_10.50000.txt    Bovin-977.ARS-UCD1.2.mapq_10.50000.bed    unp_977 unp

Any idea?

ay-lab commented 1 year ago

So, that part of the code is trying to select the best PC out of all calculated. I guess, if one of the chromosomes has a very low correlation with either the transcription start site or GC content, it may give this error. I would first suggest to check for anomalies in the unp PC values (Probably plot them in custom R scripts). Sometimes, for unconventional genomes, you may need to hand-pick PCs for some of the chromosomes. Let me know how it goes, happy to help you out.

Nico-FR commented 1 year ago

Ok, I did ckeck the PCs and PC1 is always the good one expected for chromosome X. I tryed without the X but same error occur. The quality is the same for poll or unp samples, so I do not understand why it did not work for unp.

Exemple for PC1 of unp sample image

Exemple for PC2 of unp sample image

I also have expression datas on those individuals, I can easily select the PC and oriented them.

I will try the next step with Rscript utility/reselectpc.r --reselect ref but I do not understand how it works. How to use my own oriented bedgraph with PC1 values?

Nico-FR commented 1 year ago

ok, my bad! My chromosomes on matrices are written as "1" and as "chr1" in UCSC golden path... The stdout put us on the wrong track.

ay-lab commented 1 year ago

Added a note in the documentation to look out for this! Thank you for raising the issue.