[📍] Adjust significance thresholds

In http://go/pinpoint-data, I have this chart showing how our two significance thresholds divide the result into three sections. From top to bottom: the two distributions are the same, we need more data, and the two distributions are different. One thing that's bothered me about this diagram is that the top threshold is a straight line that doesn't really match the shape of the curves below it.

What I did instead was run 100k simulations of MWU p-values, with two normal distributions that differ by 1σ, then took the 90th, 99th, and 99.9th percentiles of those simulations. That is, 99.9% of results (assuming the above distributions) will fall under the top curve in the chart below. The higher the percentile, the more the curve differs from a straight line.

The old threshold is shown here as a dashed line for comparison. (The bottom threshold can remain the same. As before, 99.9% of results will fall above the straight line at 0.001.)

The simulation code I ran is here.

import sys

import numpy
from scipy import stats

from dashboard.pinpoint.models import mann_whitney_u

data = []
for _ in xrange(100000):
  data.append(([], []))

for repeat_count in xrange(1, 121):
  p_values = []
  for a, b in data:
    a.append(stats.norm.rvs())
    b.append(stats.norm.rvs(1))
    p_values.append(mann_whitney_u.MannWhitneyU(a, b))

  p900 = str(numpy.percentile(p_values, 90))
  p990 = str(numpy.percentile(p_values, 99))
  p999 = str(numpy.percentile(p_values, 99.9))
  print '\t'.join((str(repeat_count), p900, p990, p999))
  sys.stdout.flush()

@perezju @simonhatch @anniesullie

catapult-project / catapult

[📍] Adjust significance thresholds #4508