Looks like SciPy itself doesn't have a one-tailed t-test implementation with a "difference threshold" value, so I turned to a different library called statsmodels. I tried to comment heavily to explain my reasoning behind the two tests.
Here's what the output currently looks like for the data set I currently have:
I'm printing out the mean difference in frame rate to help guide intuition. It helps show, for example, that there is actually a nontrivial difference for shadow_map.
The test I'm currently running checks whether there is a greater than 1.0 difference between the two means (i.e., H0 is that they differ by one fps).
The asterisk next to the TOST p-value indicates statistical significance. I applied a p-value threshold of α=0.05. In this setup, the test succeeds for everything but shadow_map, where the fps difference is obviously bigger than 1.
This is for our new "two one-sided tests" equivalence check procedure. It uses t-tests for now; it may be possible to adapt to Wilcoxon tests.
Looks like SciPy itself doesn't have a one-tailed t-test implementation with a "difference threshold" value, so I turned to a different library called statsmodels. I tried to comment heavily to explain my reasoning behind the two tests.
Here's what the output currently looks like for the data set I currently have:
A few things to note here:
shadow_map
.shadow_map
, where the fps difference is obviously bigger than 1.