0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

Validity of SPM for different sampling frequencies #184

Closed 0todd0000 closed 2 years ago

0todd0000 commented 3 years ago

(These questions are paraphrased from an email)

0todd0000 commented 3 years ago

Difficult questions! I'll start with the easiest...


Yes. Please see the power1d package and this sample size estimation example in particular.

Here are two additional references:

https://doi.org/10.1016/j.jbiomech.2021.110451 https://doi.org/10.1016/j.jbiomech.2017.09.031


Essentially as: the average residual derivative. First the residuals are calculated (e.g. as differences from group means in a two-sample t test). Then the derivative is estimated at each point in each 1D residual. Then the average smoothness is estimated, essentially as the average absolute derivative across points. The actual calculation is a bit more complex, but conceptually this is an adequate description.

This estimated smoothness directly affects various probabilistic quantities, including the critical threshold. When the data are rougher, the threshold is higher because rougher data are expected to have larger maximum deviations simply by chance. More details are available in rft1d and in particular in the theory overview


There are no theoretical lower or upper bounds to domain length (i.e., number of domain points) as far as I know. However, as the number of domain points shrinks (e.g. to just 2 or 3 points), the SPM solutions will converge to standard multivariate solutions. Conversely, SPM solutions diverge from standard multivariate solutions as both (i) the number of domain points increases and (ii) smoothness increases.

spm1d expects at least 10 domain points, and it may generate an error if you attempt to submit data with fewer points. However, this error does not relate to a theoretical limitation. It instead is more of a warning to users, to ensure that the data arrays are correctly formatted (i.e., use an 8x101 array as opposed to a 101x8 array).


Yes, it is valid, but there is a problem that is unrelated to SPM to consider: the data themselves might not be valid measurements of the sample frequency is too low. If the sample frequency is too low, signal aliasing can occur, which can distort the true nature of the data. In general, the data must be sampled at or above the Nyquist frequency.

Also relevant are the frequency characteristics of the statistical signal. The Nyquist frequency pertains to individual measurements, but statistical analyses pertain to sets of measurements. Thus it is possible to measure the data well under the Nyquist frequency, provided the measurement frequency is still adequate to capture the statistical signal. Here is a paper that considers this issue for 2D data:

https://doi.org/10.1016/j.jbiomech.2012.05.038

A pre-print PDF is also available here.

From Fig.8 in the paper above you'll see that it is possible to measure with very low frequency, but still adequately capture low-frequency statistical signals.

KevinGiordano commented 3 years ago

Great answer, thank you! Based on your answer to question 3 on the length of the sample, do you recommend not interpolating data unless there's a need (like wanting % of a motion from 0-100%)? If that is the case, is is there a limit to how much you can expand a set through interpolation (ie 70 frames to 100 vs, 120 frames to 100 vs. 25 frames to 100)?

Thanks again, Kevin

0todd0000 commented 3 years ago

No, there is no upper limit on the number of frames (Q) that can be used. SPM results are unaffected by Q, provided Q is sufficiently large to exceed the Nyquist frequency. Thus you could use Q=100, Q=1000, or Q=1 million, and the SPM results will be unaffected (within numerical tolerance). For very smooth data, Q=20 might even be sufficient.

The reason is that SPM considers smoothness (i.e., the derivate magnitude) relative to Q. This is what separates SPM from simple multiple testing correction procedures like the Bonferroni correction, which considers only Q and not the correlation amongst adjacent frames (i.e., smoothness).

KevinGiordano commented 3 years ago

I may have asked the question incorrectly - I'm curious about the interpolation, specifically. I wouldn't want to interpolate 50 data frames into 1000. If the Nyquist frequency is satisfied with 30 frames and I have 50, as a statistician, would you buy it if I interpolated that into 100 to give time as a percent of the movement (ie doubling the amount of data via interpolation)?

0todd0000 commented 3 years ago

It depends what the interpolation involves. If it is simple linear interpolation, then there is no problem upsampling from Q=30 frames to an arbitrary Q>30 --- because simple linear interpolation will not create information --- provided you use a smoothness-dependent correction procedure like SPM or FDA.

If instead you use spline interpolation, or interpolate over missing frames, or use any other form of nonlinear interpolation, then you would run the risk of creating artificial information. In this case there could indeed be a statistical problem, and various interpolation schemes should probably be compared in a sensitivity-type analysis to show that no particular scheme yields biased analyses.

KevinGiordano commented 3 years ago

Sorry, I should have specified - I have been using a cubic spline interpolation, If you have any further thoughts on how much of a proportionate increase is acceptable, please share. If not, you've already helped immensely, thank you!

0todd0000 commented 3 years ago

It's difficult to estimate what upsampling ratio is appropriate unless the signal, noise and sampling frequencies are all known. If they're unknown, the best approach is probably the most cautious one: upsample using the smallest possible increase, then run analyses at both (a) the original and (b) upsampled frequencies, then qualitatively compare the results for (a) and (b). If upsampling does not qualitatively affect the results then there is no problem.