Power Analysis Error - ValueError: f(a) and f(b) must have different signs

sms1097 commented 8 months ago

I'm trying to do the power analysis as found in the examples for NPS values. I have data that I shared in the attached file for Y_total and Yhat_total. I copied the code from the examples and modified it for my use case.

I do not understand the error I am getting. I am trying to report confidence intervals for the average value of the NPS. I can get this to work for percentages. Can you explain the error reason?

# Find n such that we reject H0: average NPS score <= 3 with probability 80% using a test at level alpha
num_experiments = 100
list_rand_idx = [
    np.random.permutation(n_total) for i in range(num_experiments)
]
alpha=0.05

def _to_invert_ppi(n):
    n = int(n)
    nulls_rejected = 0
    # Data setup
    for i in range(num_experiments):
        rand_idx = list_rand_idx[i]
        _Yhat = Yhat_total[rand_idx[:n]]
        _Y = Y_total[rand_idx[:n]]
        _Yhat_unlabeled = Yhat_total[rand_idx[n:]]

        ppi_ci = ppi_mean_ci(_Y, _Yhat, _Yhat_unlabeled, alpha=alpha)
        if ppi_ci[0] > 3:
            nulls_rejected += 1
    return nulls_rejected / num_experiments - 0.8

def _to_invert_classical(n):
    n = int(n)
    nulls_rejected = 0
    # Data setup
    for i in range(num_experiments):
        rand_idx = list_rand_idx[i]
        _Y = Y_total[rand_idx[:n]]

        classical_ci = classical_mean_ci(_Y, alpha=alpha)

        if classical_ci[0] > 3:
            nulls_rejected += 1
    return nulls_rejected / num_experiments - 0.8

n_ppi = int(brentq(_to_invert_ppi, 1, 1000, xtol=1))
n_classical = int(brentq(_to_invert_classical, 1, 1000, xtol=1))
print(
    f"The PPI test requires n={n_ppi} labeled data points to reject the null."
)
print(
    f"The classical test requires n={n_classical} labeled data points to reject the null."
)

trial_values.json

aangelopoulos commented 8 months ago

Can you copy paste the error?

sms1097 commented 8 months ago


ValueError                                Traceback (most recent call last)
Cell In[145], line 2
      1 n_ppi = int(brentq(_to_invert_ppi, 1, 1000, xtol=1))
----> 2 n_classical = int(brentq(_to_invert_classical, 1, 1000, xtol=1))
      3 print(
      4     f"The PPI test requires n={n_ppi} labeled data points to reject the null."
      5 )
      6 print(
      7     f"The classical test requires n={n_classical} labeled data points to reject the null."
      8 )

File [~/anaconda3/envs/airline/lib/python3.10/site-packages/scipy/optimize/_zeros_py.py:809](https://file+.vscode-resource.vscode-cdn.net/Users/seansmith/Documents/airline-absa/scripts/~/anaconda3/envs/airline/lib/python3.10/site-packages/scipy/optimize/_zeros_py.py:809), in brentq(f, a, b, args, xtol, rtol, maxiter, full_output, disp)
    807     raise ValueError(f"rtol too small ({rtol:g} < {_rtol:g})")
    808 f = _wrap_nan_raise(f)
--> 809 r = _zeros._brentq(f, a, b, xtol, rtol, maxiter, args, full_output, disp)
    810 return results_c(full_output, r, "brentq")

ValueError: f(a) and f(b) must have different signs```

aangelopoulos commented 8 months ago

This error is expected when your null hypothesis is too difficult to test given the number of samples. Can you try testing an easier null?

sms1097 commented 8 months ago

Yes I was able to get that to work but slightly confused by the behavior here.

For this example I can evaluate $H_0: X >= 3$ but not $H_0: X <= 3$. Could you help me understand why this is the case?

aangelopoulos commented 8 months ago

Can you reject 2? Looks like you should be able to based on the plot.

sms1097 commented 8 months ago

2 does not work for this example. I also tried making the values continuous (not ordinal), but that made no difference.

Could you provide an interpretation of the simulation we do here?

aangelopoulos commented 8 months ago

Interesting. Basically this simulation is asking at what n the confidence interval excludes H0. In other words, when the p value for H0 is smaller than alpha.

The error gets thrown when the binary search fails-that means one of two things:

No n can reject the null you chose.
All n reject the null you chose.

So if the search is failing, it's worth examining the p values to see which case you're in. If you're not hitting the cases I described, it signifies some sort of deeper problem.

Let me know if this helped?

sms1097 commented 8 months ago

Okay understand the interpretation but not sure I understand the parameters here. How does nulls_rejected / num_experiments - power translate to checking when the p value for H0 is smaller than alpha? What part of this relates to p value and significance level?

aangelopoulos commented 8 months ago

For any hypothesis, you can only expect to reject it with some probability: that is the power against that hypothesis. So we're searching for an n such that we can reject that hypothesis with power, say, 80%.

sms1097 commented 8 months ago

Okay so two remaining questions:

How can I check the p values as you suggested above using that code?
If we don't have the labeled data in advance, how would we conduct a power analysis to determine how much hand labeled data and machine labeled data to collect before our experience? Does peeking at the experiment play a role here?

Thanks for your help so far, it's much appreciated!

aangelopoulos commented 8 months ago

Hey! Sorry for the confusion.

The code above just checks whether the confidence interval contains the H0 value. So you should directly inspect the confidence intervals.
That's not possible without assumptions on the data. Your intervals are generally proportional in size to the standard deviation of Y-f. So if you assume stuff about the distribution of Y-f, you can predict how many samples you would need by looking at the resulting confidence interval sizes.

Happy to help!

aangelopoulos commented 7 months ago

Hey! I'll close this soon, unless you want to keep chatting.

sms1097 commented 7 months ago

Hey feel free to close, I'll open something else if I run into problems again.

aangelopoulos / ppi_py

Power Analysis Error - ValueError: f(a) and f(b) must have different signs #9