andrewrk / poop

Performance Optimizer Observation Platform
MIT License
742 stars 49 forks source link

Welch T-Test #2

Open Kuratius opened 1 year ago

Kuratius commented 1 year ago

https://en.wikipedia.org/wiki/Welch%27s_t-test

It's basically just a way to get a single number you can then look up in a table to get the probability that your change made a measurable difference. Computing t is not difficult, but getting from t to a probability can be a little bit more tricky. I have to look that part up again, I don't want to say something wrong and it's quite late for me. It would be a good idea to implement this, as it would be easier to give something that people can actually use without a statistics background or having to have a lookup table for t values in next to their keyboard.

Scipi's implementation returns this as the p-value. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

It's probably only useful for programs with a large variance in runtime when you want to test if a change made a measurable difference and only do a few dozen instead of thousands of tests to be confident.

The reason I mentioned it in the stream is because someone brought up checking if a value is within a standard deviation of the other. This is basically a more sophisticated version of that.

matu3ba commented 7 months ago

"Performance of five two-sample location tests for skewed distributions with unequal variances" by "Morten W. Fagerland, Leiv Sandvik" is a good read. It is called Welch's U test there.

If one wants to be scientific correct, one would need to generate sufficient data to check, if the test is applicable. However, that is often not practical due to side effects or long time.

Otherwise, one should provide an optional opt-in explanation.