ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
57 stars 10 forks source link

Summary error #14

Closed biozzq closed 3 years ago

biozzq commented 3 years ago

Dear all, After finished chromosome by chromosome analysis, an error occurred to me when generating the bedgraph results of all pairwise comparisons by using following command,

python /dchic/differentialCalling.py -inputFile input.txt -chrFile chr.txt -makePlots 1 -res 100000 -genome mm10 -multiComp 1 -blacklist /mm10/mm10-blacklist.v2.bed

DifferentialCompartment  folder created
Learned parameter file found. Using the IHW to boost statistical power
Running  10MF   /02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_X/chrX.PC.coordinates.txt 
Error in `[.data.frame`(df, , selected) : undefined columns selected
Calls: diffcmp -> apply -> [ -> [.data.frame
Execution halted

Sincerely, Zheng zhuqing

biozzq commented 3 years ago

Dear @ay-lab

Could you take some time to help me with this problem. I also attached the screenshot of the content of chrX.PC.coordinates.txt. image Thank you very much.

Best, Zheng zhuqing

ay-lab commented 3 years ago

Hi There,

I'm very sorry about the late response. I have run the differential calling segment standalone on my end and it worked fine but I have a few ideas of what might be happening. First, the error output stops after it finds ("running") 10MF. Based on your PC.coordinates.txt, it should look like this:

Running 10MF /02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_X/chrX.PC.coordinates.txt Running 10MM /02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_X/chrX.PC.coordinates.txt Running 8WM /02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_X/chrX.PC.coordinates.txt Running WT /02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_X/chrX.PC.coordinates.txt

Can you double check your input file to make sure everything is right (same order of experiments/groupings as listed in PC coordinates, etc.)? Also, dcHiC should also print the associated R function "Rscript /path/to/diffcmp_pythonV.r ..." that is called. Can you run that standalone?

biozzq commented 3 years ago

Dear @ay-lab

It does not matter. I have tried to run R function standalone, it also gives me above error. The R command is as following:

Rscript /dcHiC/dchic/diffcmp_pythonV.r chr_10/chr10.PC.coordinates.txt chr_11/chr11.PC.coordinates.txt chr_12/chr12.PC.coordinates.txt chr_13/chr13.PC.coordinates.txt chr_14/chr14.PC.coordinates.txt chr_15/chr15.PC.coordinates.txt chr_16/chr16.PC.coordinates.txt chr_17/chr17.PC.coordinates.txt chr_18/chr18.PC.coordinates.txt chr_19/chr19.PC.coordinates.txt chr_1/chr1.PC.coordinates.txt chr_2/chr2.PC.coordinates.txt chr_3/chr3.PC.coordinates.txt chr_4/chr4.PC.coordinates.txt chr_5/chr5.PC.coordinates.txt chr_6/chr6.PC.coordinates.txt chr_7/chr7.PC.coordinates.txt chr_8/chr8.PC.coordinates.txt chr_9/chr9.PC.coordinates.txt chr_X/chrX.PC.coordinates.txt 1 chr.txt 100000 chrdistances.txt samplefile.txt /path/dchic

The screenoutput is as following:

[1] "/path/dcHiC/dchic"
[1] "Parameter file found."
Read 20 items
 [1] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_X/chrX.PC.coordinates.txt"  
 [2] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_9/chr9.PC.coordinates.txt"  
 [3] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_8/chr8.PC.coordinates.txt"  
 [4] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_7/chr7.PC.coordinates.txt"  
 [5] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_6/chr6.PC.coordinates.txt"  
 [6] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_5/chr5.PC.coordinates.txt"  
 [7] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_4/chr4.PC.coordinates.txt"  
 [8] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_3/chr3.PC.coordinates.txt"  
 [9] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_2/chr2.PC.coordinates.txt"  
[10] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_1/chr1.PC.coordinates.txt"  
[11] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_19/chr19.PC.coordinates.txt"
[12] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_18/chr18.PC.coordinates.txt"
[13] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_17/chr17.PC.coordinates.txt"
[14] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_16/chr16.PC.coordinates.txt"
[15] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_15/chr15.PC.coordinates.txt"
[16] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_14/chr14.PC.coordinates.txt"
[17] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_13/chr13.PC.coordinates.txt"
[18] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_12/chr12.PC.coordinates.txt"
[19] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_11/chr11.PC.coordinates.txt"
[20] "/02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_10/chr10.PC.coordinates.txt"
DifferentialCompartment  folder already exists
Learned parameter file found. Using the IHW to boost statistical power
Running  10MF  /02.Results/02.dcHiC/01.mm10/02.output/00.100K_resolution/chr_X/chrX.PC.coordinates.txt 
Error in `[.data.frame`(df, , selected) : undefined columns selected
Calls: diffcmp -> apply -> [ -> [.data.frame
Execution halted

The content of chr.txt, chrdistances.txt and samplefile.txt is as follows:

cat chr.txt
10
11
12
13
14
15
16
17
18
19
1
2
3
4
5
6
7
8
9
X
cat chrdistances.txt
m   s   chr
0.04167433907202319 0.037840360800702126    10
0.03915005434671804 0.0359192773767521  11
0.045147307972388355    0.0430354037903573  12
0.06023429274654387 0.0540414496247747  13
0.04428464213452534 0.042589826860710805    14
0.04427804765959579 0.04074455027170153 15
0.0449592172300395  0.04181659531125885 16
0.03723546086675897 0.03656691368422088 17
0.044145818044745856    0.04062951935154244 18
0.05704582962055796 0.05169311110713221 19
0.04660557234452178 0.041212480369738534    1
0.04890091905467735 0.04364167179948949 2
0.04584461407776562 0.039609724271287415    3
0.04666640704810197 0.044430134768668486    4
0.05056199711807708 0.04814993167680187 5
0.041593477068725515    0.03721484267812039 6
0.04716659951394381 0.04869042361969862 7
0.041478786162858855    0.04019080957219786 8
0.04649006903704483 0.04240341463851759 9
0.23845378515510865 0.22166782326264944 X
cat samplefile.txt
replicate   prefix
10MF.1  10MF
10MF.2  10MF
10MM.1  10MM
10MM.2  10MM
8WM.1   8WM
8WM.2   8WM
WT.1    WT
WT.2    WT
WT.3    WT

Hope these information can help you with my problem.

Best wishes, Zheng zhuqing

ay-lab commented 3 years ago

Hi There!

After a lot of poking around, I believe I found the source of this error—R does not accept invalid variable names (in this case, names that start with a number) and automatically coerces them into a valid one by prepending an "X" (so "8WM.1" becomes "X8WM.1"). This should be fixed (fingers crossed) with the updated diffcmp_pythonV.r script. Can you try it out and see what the result is?

biozzq commented 3 years ago

Dear @ay-lab

Yes, the corrected version can run smoothly. Thank you your help and the patience.

Best. Zheng zhuqing