Closed alanyuchenhou closed 7 years ago
The experiment needs to include multiple trials for each data set with low variance in accuracy in order to make a strong claim that Model R is better than WSBM and its variants.
use t-test to compare the models to make a stronger claim
found the scipy implementation of t-test scipy.stats.ttest_ind_from_stats confirmed this is Student's t-test from the docs' reference to the wiki page of Student's t-test decided to choose equal_var=False (i.e., does not assume equal population variance)
To be specific, this is a one-tailed, paired t-test. One-tailed, because we want to know if method 1 is better than method 2. A two-tailed would be used if we just want to know if the methods are different. Paired means that the data used for each trial is the same for method 1 and method 2. When you compute t-statistic, make sure you use the correct formula (paired), and compare to the right threshold (one-tailed).
Got it. I noticed the difference. I'll keep working on the paired one; but meanwhile, let me also do the unpaired one, because it's very cost-efficient as I've got all the data it needs already. I think it still can help me make a stronger claim if pvalue is small enough, even if it's not as strong as the paired one.
experiment settings
experiment results