PoonLab / vindels

Developing an empirical model of sequence insertion and deletion in virus genomes
1 stars 0 forks source link

Hypothesis Testing - Dinucleotide Proportions #72

Closed jpalmer37 closed 5 years ago

jpalmer37 commented 5 years ago

We decided that we want to figure out an appropriate hypothesis to test with our new dinucleotide results and also find an appropriate statistical test for it.

Potential Ideas:

Current focus:

ins-dinucl del-dinucl

jpalmer37 commented 5 years ago

The results after performing a randomization test on the data, checking for significant deviations from the variable loop proportions (shown by larger/smaller text). None were found in insertions: ins-dinucl The only significant difference was a significantly higher CG proportion in deletions. This appears to be affected by the extremely low CG content in the variable loops. The randomly sampled distribution is heavily centred around zero even up to the 97.5 percentile, which allowed for the small proportion to be significant.

 CG
 2.5% 97.5% 
    0     0 

del-dinucl

ArtPoon commented 5 years ago

Try a paired t-test on residuals of insertion and deletion dinucleotide frequency residuals

jpalmer37 commented 5 years ago

I performed a paired t-test on insertion and deletion residuals and found no significant difference between them.

Paired t-test

data:  resid by indel
t = 1.4133e-15, df = 15, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.01013769  0.01013769
sample estimates:
mean of the differences 
           6.722053e-18 

This seems to be confirmed by the highly similar distributions of these residuals. dinucl-resid

jpalmer37 commented 5 years ago

In our meeting, you mentioned that there isn't much to investigate within these data given their lack of variation shown in these plots. I'll close this issue for now.