andrewrk / poop

Performance Optimizer Observation Platform
MIT License
788 stars 50 forks source link

reexamine the conditions for marking the ratio insignificant #7

Open andrewrk opened 1 year ago

andrewrk commented 1 year ago

Currently it is done like this:

https://github.com/andrewrk/poop/blob/b01058a8081e5584f1c28f74c59643e413b562df/src/main.zig#L365

but maybe there is better way to do it, that is more well accepted in the field of statistics.

(perhaps std deviation should be involved?)

matu3ba commented 9 months ago

perhaps std deviation should be involved?

  1. To be statistically correct, you would have first to do sufficient retries to estimate, if the standard deviation is the correct guess. https://stats.stackexchange.com/questions/108578/what-does-standard-deviation-tell-us-in-non-normal-distribution. This may or may not be practical depending on the system behavior (influences of other processes may make the exact distribution unobservable in practice), so it is usually skipped and Gaussean distribution assumed.
  2. The next thing is the to estimate confidence intervals, ie theory here https://medium.com/swlh/a-simple-refresher-on-confidence-intervals-1e29a8580697 and more practical motivation https://www.brainvitamins.net/blog/confidence-intervals-for-benchmarks/, however the Welch test with abbreviated context and optional explanation would be much better, see #2.