alanyuchenhou / elephant

MIT License
4 stars 5 forks source link

weight prediction experiments #36

Closed alanyuchenhou closed 7 years ago

alanyuchenhou commented 7 years ago

experiment settings

data set WSBM (weighted stochastic block model) Model R
airport 0.0486 0.0136
collaboration 0.0407 0.0352
congress 0.0571 0.0560
forum 0.0726 0.0326
alanyuchenhou commented 7 years ago

The experiment needs to include multiple trials for each data set with low variance in accuracy in order to make a strong claim that Model R is better than WSBM and its variants.

alanyuchenhou commented 7 years ago

use t-test to compare the models to make a stronger claim

alanyuchenhou commented 7 years ago

found the scipy implementation of t-test scipy.stats.ttest_ind_from_stats confirmed this is Student's t-test from the docs' reference to the wiki page of Student's t-test decided to choose equal_var=False (i.e., does not assume equal population variance)

ghost commented 7 years ago

To be specific, this is a one-tailed, paired t-test. One-tailed, because we want to know if method 1 is better than method 2. A two-tailed would be used if we just want to know if the methods are different. Paired means that the data used for each trial is the same for method 1 and method 2. When you compute t-statistic, make sure you use the correct formula (paired), and compare to the right threshold (one-tailed).

alanyuchenhou commented 7 years ago

Got it. I noticed the difference. I'll keep working on the paired one; but meanwhile, let me also do the unpaired one, because it's very cost-efficient as I've got all the data it needs already. I think it still can help me make a stronger claim if pvalue is small enough, even if it's not as strong as the paired one.