ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
57 stars 10 forks source link

run err #33

Closed Nuturetree closed 2 years ago

Nuturetree commented 2 years ago

Hi author when I was run "Rscript /public/home/xhhuang/biosoft/dcHiC/dchicf.r --file input_f2.txt --pcatype cis --dirovwt T --cthread 1 --pthread 1", the result show that

Error in $<-.data.frame(*tmp*, "expcc", value = 1) : replacement has 1 row, data has 0 Calls: lapply -> FUN -> $<- -> $<-.data.frame Execution halted

I am sure the input file is ture. the input file format is that ./data/DPA0_chr1.matrix ./data/chr1_abs.bed DPA0_100kb DPA0 ./data/DPA5_chr1.matrix ./data/chr1_abs.bed DPA5_100kb DPA5

can you tell me how I can resolve this problem?

thanks

Nuturetree commented 2 years ago

can you give me some test data? I do not find the test data of the matrix.

ay-lab commented 2 years ago

This is probably due to the presence of non-standard chromosomes like chrM, chrY, random in the bed file. Most of the time these chromosomes don't have enough data points to perform the pca calculation and thus raise the error. Please try to remove those chromosomes from the bed file and try it again. There is a demo file inside the "packages" directory, but it doesn't have the step 1 input files because of size issues.

Nuturetree commented 2 years ago

Hi author, Thank you for your patient answer. According to your suggestion, I modify the matrix and bed file by removing the scaffold interaction. But this problem still keeping. After reading your article, I have great agreed this tool is very useful. So I want to run it, complementary. If the matrix only contains intrachromosomal interaction? If can give me a download link to the raw input matrix file? Thanks very much.

Here is the command to run and all the output

(dchic3) [hhh@sg01 dchic]$ Rscript ~/biosoft/dcHiC/dchicf.r --file input_f2.txt --pcatype cis --dirovwt T --cthread 1 --pthread 1 Reading ~/biosoft/soft_test/dchic/data/DPA0_noscaff.matrix Hi-C matrix file Reading ~/biosoft/soft_test/dchic/data/100K_abs_rename.bed Hi-C bed file A B Weight chr1 pos1 chr2 pos2 dist 1: 1 1 2944 chr1 0 chr1 0 0 2: 1 2 1144 chr1 0 chr1 100000 100000 3: 1 3 361 chr1 0 chr1 200000 200000 4: 1 4 274 chr1 0 chr1 300000 300000 5: 1 5 237 chr1 0 chr1 400000 400000 6: 1 6 212 chr1 0 chr1 500000 500000 Calculating expected counts from chromosome wise background dist Weight 1 0 4136745 2 100000 2203017 3 200000 1036225 4 300000 728007 5 400000 567312 6 500000 470337 Error in $<-.data.frame(*tmp*, "expcc", value = 1) : replacement has 1 row, data has 0 Calls: lapply -> FUN -> $<- -> $<-.data.frame Execution halted

ay-lab commented 2 years ago

Hi Nuturetree,

Thanks for trying out the option. I can help you to resolve this. Can you please attach the rename.bed (or send the bed file to abhijit@lji.org) file and show the contents inside the *_pca folder (a tree or by ls will be good enough)?

ay-lab commented 2 years ago

Hi, I recreated the issue and resolved it. It was a minor bug in an if-else conditional statement to precheck the data.It is now fixed and I have successfully run the PCA part.  Please download the latest dchic.r code before running. Also please note that as this is not the standard genome, to run the next steps you will need a few custom files. It is easy to generate. Please check the following instruction. 

Support for other genomes: While it has only been extensively tested for human and mouse genomes, dcHiC supports most other commonly-used genomes that are under the UCSC genome page. To utilize this, create a folder {genome}_{resolution}_goldenpathData (e.g hg38_100000_goldenpathData). Within that folder put three files: a) {genome}.fa (e.g. hg38.fa from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz) b) {genome}.tss.bed (e.g. hg38.tss.bed, this is simply the transciption start site position of the https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/hg38.refGene.gtf.gz file. Please make sure tss position is selected based on the strand direction) & c) {genome}.chrom.sizes files (e.g. hg38.chrom.sizes from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes). These files can be found under the UCSC bigZips page for the specified genome. When running dcHiC use the --gfolder option in the select step to provide the folder path, and dcHiC will create the necessary files.

Nuturetree commented 2 years ago

Hi author Thanks very much. Now, I can run step one. But I find other defects that other software may also have. The PCA of some chromosomes could need manual correction of direction. Because the ends of the chromosomes are considered to be in the B compartment state, while the middle is with A compartment. cworld (https://github.com/dekkerlab/cworld-dekker) can solve this problem very well by introducing gene annotation files. It's not a big issue, but researchers are always a group of people who seek perfection.

chr start end index PC1 PC2 chr4 0 100000 1 -42.6807006823992 -17.143626903196 chr4 100000 200000 2 -42.6224189766547 -15.7242452633627 chr4 200000 300000 3 -45.9440489436852 -18.4126008821374 chr4 300000 400000 4 -44.1960684724032 -9.48118611043915 chr4 400000 500000 5 -45.2648798196879 -13.2649141833863 chr4 500000 600000 6 -32.2657353164629 -1.92148823251993 chr4 600000 700000 7 -44.2155975779798 -16.6327381340201 chr4 700000 800000 8 -44.6377228668559 -16.0412698898729 chr4 800000 900000 9 -48.0361571926599 -16.7706046991374 ....................................................................................................................................................... ....................................................................................................................................................... ....................................................................................................................................................... chr4 80600000 80700000 807 -46.3940201501861 -21.5823589553061 chr4 80700000 80800000 808 -36.9500763028478 -10.0298400847981 chr4 80800000 80900000 809 -37.4330205754868 -9.91408640284615 chr4 80900000 81000000 810 -48.202315834634 -16.7962842373112 chr4 81000000 81100000 811 -46.8138026151128 -23.8354798721132 chr4 81100000 81200000 812 -46.7543141858341 -21.835352971769 chr4 81200000 81300000 813 -45.8447098516168 -18.7034110203084 chr4 81300000 81400000 814 -43.4539382427937 -20.0584706525769 chr4 81400000 81500000 815 -39.4605390988887 -12.8250742305871 chr4 81500000 81554553 816 -32.3894932027343 -8.22725357123237

I hope this will help you to improve the software Nuturetree

Nuturetree commented 2 years ago

I found that you solved this problem in the second step

csijcs commented 2 years ago

I am having a similar issue, and I am using the updated dchicf.r script. When I run:

Rscript ~/dcHiC/dchicf.r --file 40kb_input.txt --pcatype subcomp --dirovwt T --diffdir all_samples_40Kb I get the error:

Error in `[<-.data.frame`(`*tmp*`, , "state", value = 1:6) :
  replacement has 6 rows, data has 3
Calls: subcompartment -> hmmsegment -> [<- -> [<-.data.frame
Execution halted
ay-lab commented 2 years ago

Hi,

This is a separate error related to subcompartment detection using hidden markov model. Did you manage to run the "--pcatype cis" step? And I request you to please raise this as a separate issue.

Thanks a lot.

csijcs commented 2 years ago

Sorry, I thought the two might be related. I will make a new issue of it