I'm not sure KS test in RQ3 is right or wrong

ai-se / tunelearners

0 stars 1 forks source link

I'm not sure KS test in RQ3 is right or wrong #25

Closed WeiFoo closed 9 years ago

WeiFoo commented 9 years ago

I added the KS results into RQ3, but what I did was pass those numbers from tuned CART and Tuned RF columns to do KS test. is it right?

timm commented 9 years ago

cant see those results

and do you mean RQ3... which is about discrete values... or RQ4... which is about the numeric distributions of fig4.

and if you meanf fig4, are you saying that statistically, there is no difference in the curves shown in fig4, as determined by KS?

WeiFoo commented 9 years ago

In 4.2 section, the next to last paragraph. I added a sentence about KS test done on table 4, and 5

timm commented 9 years ago

what about the fig4 nums. are they insignificantly different?

WeiFoo commented 9 years ago

in fig4, if D value got from KS test greater than 1.36 x sqrt(34/(17*17)) = 0.46, we can reject that two distributions are from the same distribution.i.e. they're different distributions.

precision, CART with Where D= 0.4118, CART with RF D = 0.2941, RF with Where D = 0.1765
F: Where with RF D = 0.2941, Where with CART D = 0.2352, RF with CART D = 0.2353.

According to these D's, we will say, they're all not significantly different from each other!

timm commented 9 years ago

what is the threshold for "different"?

WeiFoo commented 9 years ago

0.46

timm commented 9 years ago

Precc

	RF	WHERE
CART	0.29	0.42
RF		0.18
WHERE

	RF	WHERE
CART	0.24	0.24
RF		0.29
WHERE

timm commented 9 years ago

can you add the above tables to the paper

WeiFoo commented 9 years ago

Sure, I will do it.

WeiFoo commented 9 years ago

added! right below fig.4

60d7a186-4af7-4d63-ba4b-06261d76c7f6

WeiFoo commented 9 years ago

This observation is supported by the KS results of Table 7.
At a 95% confidence, the KS thresh- old is 1.36􏰃34/(17 ∗ 17) = 0.46 
which is greater than the values in Figure 4.
That is, no result in Figure 4 is significantly different to any 
other– which is to say that there is no evidence that np=10
 is a poor choice of search space size.

Here we do KS on improvements of different learners, our conclusion is no result in Figure 4 is significantly different to any other

Do we need to do KS test on results of each learner with np = 10 and recommended np? then the conclusion would be for each learner, tuning with np = 10 is no significantly different to any recommended np's?

To me, the above two conclusions are slightly different.

timm commented 9 years ago

i thought your treatments were

np=10 vs np=EvertthingElse

if so, our current results are fine.

but please advise