How to obtain a p-value using eqvartest

ColineVW commented 1 year ago

Hi,

I am currently working with the function "eqvartest" in Python and I was wondering how I can get the p-value associated to each cluster (such as in the t-test inference) when the computed f-value is above the threshold (fcrit)

Thanks!

Coline

0todd0000 commented 1 year ago

There are two main reason that cluster-level p-values are not reported for spm1d.stats.eqvartest:

This was a third-party add-on (Kowalski et al., 2021) and p-value calculations were not provided in that add-on
The primary purpose of the test is simply to test variance equality, so the primary null hypothesis rejection decision should be sufficient for reporting purposes; should you require cluster p-values it would be appropriate to report "p < alpha" for each cluster.

If you would prefer to have better values than just "p<alpha", cluster-specific p-values can be calculated separately using rft1d as follows:

Use the rft1d.geom.ClusterMetricCalculator class to calculate cluster extents (see Computing upcrossing metrics)
Use the rft1d.prob.RFTCalculator class to calculate cluster p-values as shown below (see also this example)

import rft1d
calc = rft1d.prob.RFTCalculator(STAT='Z', nodes=101, FWHM=fwhme)
p    = calc.p.cluster(k, h)

where:

fwhme is the estimated FWHM; this can be obtained from the residuals using rft1d.geom.estimate_fwhm
k is a given cluster's extent
h is the critical test statistic value at alpha (i.e., the second output argument of spm1d.stats.eqvartest)

NOTE: I have added this issue to the to-do list for the next major release of spm1d (v0.5)

Reference:

Kowalski, E., Catelli, D. S., & Lamontagne, M. (2021). A waveform test for variance inequality, with a comparison of ground reaction force during walking in younger vs. older adults. Journal of Biomechanics 127: 110657. https://doi.org/10.1016/j.jbiomech.2021.110657

ColineVW commented 1 year ago

Hi,

Thanks a lot for your quick reply, it helps us a lot. We have modified the eqvartest function to add these lines of code which allows us to get the p-value for each cluster. We got some results but we are not sure about the input for "df" inside the RFTCalculator. Most of the examples do not match our data set and for instance, in the following reference it is written in the RFTCalculator function (df = (1,df)), without details on the meaning of "1", ref : https://spm1d.org/rft1d/Examples/Application.html)

Currently, we want to compare two groups with 25 time series each. So, for the eqvartest our df is df = (24,24). Should we use the same df input for both eqvartest and RFTCalculator or is there a difference?

Please, find below the code we have added in the eqvartest :

calc = rft1d.geom.ClusterMetricCalculator()
a=calc.cluster_extents(f, fcrit,interp=True)
a_resels = [kk/efwhm for kk in a]
calc2 = rft1d.prob.RFTCalculator(STAT='F',df = (24,24), nodes=101, FWHM=efwhm)
kmin = min(a_resels)
pCluster = [calc2.p.cluster(kk, fcrit)   for kk in a_resels]

0todd0000 commented 1 year ago

We got some results but we are not sure about the input for "df" inside the RFTCalculator.

df is the degrees of freedom and the two components are: (numerator, denominator). For this test df = (J0-1, J1-1) where J0 and J1 are the sample sizes of the two groups. So your use of df=(24,24) for J0=J1=25 is correct.

Should we use the same df input for both eqvartest and RFTCalculator or is there a difference?

df should be the same in all calculations.

Please, find below the code we have added in the eqvartest :

Your code looks good. However, please consider the following two points:

It is not possible at this stage to verify whether these calculations are accurate. Doing so requires systematic numerical validations for different cases involving variations of sampling sizes, smoothness, unequal variance, etc. This requires a fairly substantial programming effort, and until that validation has been completed it is not possible to conclude that these calculations are accurate. Cluster-level p-value accuracy has been verified for all other procedures in spm1d, but not yet for eqvar. This is another reason why cluster-level p-values are not yet provided for this test.
Please avoid modifying the source code if possible. Please understand that source code modifications cannot be supported in this forum. So instead of modifying eqvar.py I recommend creating your own function, possibly a copy of eqvartest to conduct this analysis. Source code modifications can indeed be considered if you submit you modifications via a pull request on GitHub, but in this case these changed would not be accepted because validation has not yet been conducted.

ColineVW commented 1 year ago

Thanks again for your response, it was really helpful. Regarding the second point, my sentence was not clear, we have indeed created another function based on eqvartest in which we added these lines of code to conduct this analysis. Thanks!

0todd0000 / spm1d

How to obtain a p-value using eqvartest #247