Closed jpalmer37 closed 5 years ago
The results after performing a randomization test on the data, checking for significant deviations from the variable loop proportions (shown by larger/smaller text). None were found in insertions: The only significant difference was a significantly higher CG proportion in deletions. This appears to be affected by the extremely low CG content in the variable loops. The randomly sampled distribution is heavily centred around zero even up to the 97.5 percentile, which allowed for the small proportion to be significant.
CG
2.5% 97.5%
0 0
Try a paired t-test on residuals of insertion and deletion dinucleotide frequency residuals
I performed a paired t-test on insertion and deletion residuals and found no significant difference between them.
Paired t-test
data: resid by indel
t = 1.4133e-15, df = 15, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.01013769 0.01013769
sample estimates:
mean of the differences
6.722053e-18
This seems to be confirmed by the highly similar distributions of these residuals.
In our meeting, you mentioned that there isn't much to investigate within these data given their lack of variation shown in these plots. I'll close this issue for now.
We decided that we want to figure out an appropriate hypothesis to test with our new dinucleotide results and also find an appropriate statistical test for it.
Potential Ideas:
Are proportions of nucleotides / dinucleotides significantly more variable relative to the same proportions found in deletions? (we were uncertain of whether to use deletions as a reference group, we did not think of a statistical test for this specifically)
Test whether the presence of a C in a dinucleotide correlates with significantly higher proportions in insertions / deletions (tried a signed rank test, looking for a parametric test)
Current focus: