ay-lab / fithic

Fit-Hi-C is a tool for assigning statistical confidence estimates to chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C.
MIT License
79 stars 16 forks source link

Fragment length is not consistent with fithic resolution (-r) in tests data #48

Open bnjvrjnke opened 3 years ago

bnjvrjnke commented 3 years ago

When running the tests data, I noticed that you set resolution (-r) to 100000, however, the input file has a fragment size of 1000000.

Here is tests script:

#/fithic/fithic/tests/run_tests-git.sh
#line 46-49
for i in Dixon_IMR90_HindIII_hg19_w100000; do
    python3 ../fithic.py -r 100000 -l "$i" -i $inI/$i.gz -f $inF/$i.gz -b $noOfBins -p $noOfPasses -o outputs/${i}.interOnly -x interOnly
    python3 ../fithic.py -r 100000 -l "$i" -i $inI/$i.gz -f $inF/$i.gz -b $noOfBins -p $noOfPasses -o outputs/${i}.all -x All
done

Here is tests data:

#/fithic/fithic/tests/contactCounts/Dixon_IMR90_HindIII_hg19_w100000.gz
chr10   500000  chr10   500000  13850
chr10   500000  chr10   1500000 3472
chr10   500000  chr10   10500000        370

Here is log file:

#Dixon_IMR90_HindIII_hg19_w100000.fithic.log
Interactions file read successfully
-----------------------------------------------------------------------------------
-
Observed, Intra-chr in range: pairs= 215762      totalCount= 91387585
Observed, Intra-chr all: pairs= 218642   totalCount= 121700752
Observed, Inter-chr all: pairs= 3878618  totalCount= 99952107
Range of observed genomic distances [1000000 249000000]

Making equal occupancy bins
-----------------------------------------------------------------------------------
-
Observed intra-chr read counts in range 91387585
Desired number of contacts per bin      456937.925,
Number of bins  200
Equal occupancy bins generated

Looping through all possible fragment pairs in-range
-----------------------------------------------------------------------------------
-
Chromosome 'chr1',      250 mappable fragments,         -2487765 possible intra-chr
 fragment pairs in range,    715750 possible inter-chr fragment pairs
Chromosome 'chr10',     136 mappable fragments,         -733191 possible intra-chr 
fragment pairs in range,     404872 possible inter-chr fragment pairs
Chromosome 'chr11',     136 mappable fragments,         -733191 possible intra-chr fragment pairs in range,     404872 possible inter-chr fragment pairs
Chromosome 'chr12',     134 mappable fragments,         -711689 possible intra-chr fragment pairs in range,     399186 possible inter-chr fragment pairs
Chromosome 'chr13',     116 mappable fragments,         -532571 possible intra-chr fragment pairs in range,     347652 possible inter-chr fragment pairs
Chromosome 'chr14',     108 mappable fragments,         -461283 possible intra-chr fragment pairs in range,     324540 possible inter-chr fragment pairs
Chromosome 'chr15',     103 mappable fragments,         -419328 possible intra-chr fragment pairs in range,     310030 possible inter-chr fragment pairs
Chromosome 'chr16',     91 mappable fragments,  -326796 possible intra-chr fragment pairs in range,     275002 possible inter-chr fragment pairs
Chromosome 'chr17',     82 mappable fragments,  -264957 possible intra-chr fragment pairs in range,     248542 possible inter-chr fragment pairs
Chromosome 'chr18',     79 mappable fragments,  -245784 possible intra-chr fragment pairs in range,     239686 possible inter-chr fragment pairs
Chromosome 'chr19',     60 mappable fragments,  -141075 possible intra-chr fragment pairs in range,     183180 possible inter-chr fragment pairs
Chromosome 'chr2',      244 mappable fragments,         -2369499 possible intra-chr fragment pairs in range,    700036 possible inter-chr fragment pairs
Chromosome 'chr20',     64 mappable fragments,  -160719 possible intra-chr fragment pairs in range,     195136 possible inter-chr fragment pairs
Chromosome 'chr21',     49 mappable fragments,  -93654 possible intra-chr fragment pairs in range,      150136 possible inter-chr fragment pairs
Chromosome 'chr22',     52 mappable fragments,  -105627 possible intra-chr fragment pairs in range,     159172 possible inter-chr fragment pairs
Chromosome 'chr3',      199 mappable fragments,         -1574304 possible intra-chr fragment pairs in range,    579886 possible inter-chr fragment pairs
Chromosome 'chr4',      192 mappable fragments,         -1465167 possible intra-chr fragment pairs in range,    560832 possible inter-chr fragment pairs
Chromosome 'chr5',      181 mappable fragments,         -1301586 possible intra-chr fragment pairs in range,    530692 possible inter-chr fragment pairs
Chromosome 'chr6',      172 mappable fragments,         -1174947 possible intra-chr fragment pairs in range,    505852 possible inter-chr fragment pairs
Chromosome 'chr7',      160 mappable fragments,         -1016175 possible intra-chr fragment pairs in range,    472480 possible inter-chr fragment pairs
Chromosome 'chr8',      147 mappable fragments,         -857172 possible intra-chr fragment pairs in range,     436002 possible inter-chr fragment pairs
Chromosome 'chr9',      142 mappable fragments,         -799617 possible intra-chr fragment pairs in range,     421882 possible inter-chr fragment pairs
Chromosome 'chrX',      156 mappable fragments,         -965811 possible intra-chr fragment pairs in range,     461292 possible inter-chr fragment pairs
Chromosome 'chrY',      60 mappable fragments,  -141075 possible intra-chr fragment pairs in range,     183180 possible inter-chr fragment pairs
Number of all fragments= 3113
Possible, Intra-chr in range: pairs= -19082983 
Possible, Intra-chr all: pairs= 241996.0 
Possible, Inter-chr all: pairs= 4604945.0 
Desired genomic distance range   [0 inf] 
Range of possible genomic distances  [100000  249450000] 
Baseline intrachromosomal probability is 4.13229970743318e-06 
Interchromosomal probability is 2.1715785964870374e-07 
5th quantile of biases: 0.57080572791248
50th quantile of biases: 1.01076079547
95th quantile of biases: 1.20269227401
Out of 3053 loci 85 were discarded with biases not in range [0.5 2]

Calculating probability means and standard deviations of contact counts
------------------------------------------------------------------------------------
Means and error written to outputs/Dixon_IMR90_HindIII_hg19_w100000.all/Dixon_IMR90_HindIII_hg19_w100000.fithic_pass1.res100000.txt

Fitting a univariate spline to the probability means
-----------------------------------------------------------------------------------
Spline successfully fit

The 'Possible, Intra-chr in range: pairs= -19082983' seems weird. If set -r to 1000000, the 'Intra-chr in range: pairs= ' is a positive number and the significant interactions greatly reduce. Shouldn't the resolution parameter (fithic -r) be the same as the fragment length (Dixon_IMR90_HindIII_hg19_w100000.gz)?

ay-lab commented 3 years ago

thanks. we have made the change in the test script now.