DABEST calculation of median difference CIs often fails

seamusholden commented 2 years ago

We are frequently running into a problem that, for sensible looking datasets which are quantized but have large sample size, calculation of the median difference confidence interval fails in at least 50% of cases.

To summarise the below tests:_: I suspect there is a bug in either the acceleration or the bias for the BCa calculation in compute_interval_limits in confint_2groupdiff.py which is causing undefined confidence intervals for the median difference calculation. But I dont understant the BCa calculation well enough to diagnose the issue further. Any advice/ help appreciated

I've attached some test data showing the issue: Data.csv

The data are quantized and somewhat skewed so we mainly look at medians to summarize the data. There are quite a few data points (hundreds per sample) and a decent spread so we should not expect run into known issues with bootstrap median calculations issues where we mainly obtain the same median value for each bootstrap.

If we run the median difference analysis, the "B-Ref" comparison fails to calculate confidence intervals, even though we can see that the estimated PDF is clearly defined:

import numpy
import pandas
import dabest
import matplotlib.pyplot as plt
df = pandas.read_csv('Data.csv')
life_dabest = dabest.load(data=df, x = "X", 
                          y = "Y",
                          idx=('Ref','A',
                                "B"))
plt.figure()
fig = life_dabest.median_diff.plot()

Outputs:

C:\Users\nsh167\Anaconda3\lib\site-packages\dabest_classes.py:792: UserWarning: The lower limit of the BCa interval cannot be computed. It is set to the effect size itself. All bootstrap values were likely all the same. stacklevel=0) C:\Users\nsh167\Anaconda3\lib\site-packages\dabest_classes.py:797: UserWarning: The upper limit of the BCa interval cannot be computed. It is set to the effect size itself. All bootstrap values were likely all the same. stacklevel=0)

From looking at the code it looks like a failure in calculating either the acceleration or the bias for the BCa calculation in compute_interval_limits in confint_2group_diff.py.

However if we add a negligible amount of noise to the data - just enough so that strictly the data are no longer quantized, but not enough to affect the spread of the data in any meaningful way, the CI calculation works fine. However, I am a not completely comfortable about using this quite ad hoc approach as a workaround!

df2=df
n = len(df2['Y'] )
noise = numpy.random.normal(0, 0.0001, n)
df2['Y']=df2['Y']+noise

life_dabest2 = dabest.load(data=df2, x = "X", 
                          y = "Y",
                          idx=('Ref','A',
                                "B"))
plt.figure()
fig2 = life_dabest2.median_diff.plot()

As a further sanity check I tested whether I could reproduce the issue in MATLAB. Basically to get an idea whether it was more likely to be a statistics or code issue. The standard DABEST-MATLAB package does not do median difference plots, so a while ago I hacked together a simple derivative package: https://github.com/HoldenLab/violinplusDABEST-Matlab

This package uses the MATLAB bootci function, which also uses the BCa method seems to be able to calculate Interestingly, the MATLAB function can calculate the CIs just fine, and looks identical to the 2nd python plot

tmp=importdata('Data.csv');

data = tmp.data;
cats = tmp.textdata(2:end,1);
grouporder ={'Ref','A','B'};

violinplusdabest(data,cats,{'Ref','A','B'})

josesho commented 2 years ago

Hi @seamusholden ,

this is indeed curious, let me take a closer look at get back to you....

seamusholden commented 2 years ago

Thanks!

Jacobluke- commented 1 year ago

Hi @seamusholden ,

Thanks for your patience,

Since the median difference BCa CI doesn't work well with quantized data, we recommend that you use the mean difference instead.

We are now working on an update that will give percentile confidence intervals for the median difference as a fallback for when the BCa calculation fails due to quantized data.

In case you are interested, our reasoning is below.

Cheers, @Jacobluke- and @adamcc

Issue with median difference BCa For the zero-width confidence interval issue, the problem lies not in the code per se, but the nature of the data (quantized) and the way the BCa CI is calculated.

To calculate the BCa interval, we have to use jackknife method to calculate the corresponding acceleration value; the jackknife method removes one element in the sample each time to calculate the desired statistic.

As the chosen effect size is the median difference (and because the data points in the sample are crowded at the median), the jackknife method always generates identical values for the median. This means the denominator in calculating the acceleration value will be zero. Because the resulting acceleration value is undefined, the BCa confidence interval ends up with a width of zero.

We tried using dabestr on the same dataset you provided and it generates a result similar to DABEST-MATLAB. The reason is that the boot.ci function in both DABEST-MATLAB and dabestr doesn't use jackknife method to resample. So the median in each sample don't remain the same, and the acceleration value can be calculated. Because of that, the C.I. carried out by DABEST-MATLAB and dabestr looks fine but actually wrong.

Further Actions -We will fix the bug in dabestr to ensure it uses the jackknife. -Maybe add code to issue warnings about quantized data.

seamusholden commented 1 year ago

Thanks for looking into that so thoroughly. Are you sure that the current MATLAB/ R bootstrap CI implementations are faulty? It seems like the jacknife accelaration calculation is failing for the quantized data, rather than producing reasonable results.

As usual the MATLAB implementation is a bit unclear but refers to issues with "lumpy" data suggesting that indeed workarounds for quantized data are resonable - and surely the fact that both the ad hoc add a small amount of noise approach and the matlab bootci function give the same CI distribution supports this?

From MATLAB help file regards how the acceleration parameter is calculated:

Bias corrected and accelerated percentile method [3], [4]. This method involves a z0 factor computed using the proportion of bootstrap values that are less than the original sample value. To produce reasonable results when the sample is lumpy, the software computes z0 by including half of the bootstrap values that are the same as the original sample value.

Jacobluke- commented 1 year ago

Hi @seamusholden ，

Thanks for pointing out the problem. The MATLAB does provide a reasonable workaround for the quantized, or lumpy, type of data. If the calculation of acceleration follows the original jackknife method, it cannot produce a reasonable results. But for the dabestr, there is a parameter lacking to ensure the program using jackknife method to generate the acceleration value. See empinf function documentation

As we mentioned earlier, jackknife method is failing for quantized (lumpy) data, and it's even worse when dealing with quantile statistics. For bootstrapping, it's recommended to avoid using quantile statistics like median.

Further action

We will give users the option of choosing which kind of confidence interval to be used in the plot
We will add warnings about choosing quantile statistics in bootstrapping
Fix the parameter problem of dabestr.

Cheers, @Jacobluke-

ACCLAB / DABEST-python

DABEST calculation of median difference CIs often fails #129