0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

SPM with multiple trials #224

Closed adammattiussi closed 2 years ago

adammattiussi commented 2 years ago

Hi Todd,

I recently watched your lecture with Stuart McErlain-Naylor on SPM. It was very helpful, thank you.

I am reaching out as I would like to know how you might wrangle your continuous data for SPM if you had multiple trials per participant. Would you include all unique trials? Or take a mean across each time point once time-normalised? Or select one trial to include in the SPM analysis based on the 'best performance'?

Any thoughts would be greatly appreciated.

Thank you in advance!

0todd0000 commented 2 years ago

Including all trials is generally a good idea because it helps with smoothness estimates: the more observations, the more accurate the smoothness estimate.

However, including all trials vs. using just means will likely have negligible effects on the results, especially if there is a large experimental effect.

I would recommend against selecting one trial unless there is an objective reason to select it. If it is a sports movement and the selected trial is the one that resulted in the highest / fastest / strongest performance then that could indeed be used. If there is no objective criterion for selecting a single trial then the within-subject mean would be better.

adammattiussi commented 2 years ago

That’s really helpful, thank you.

I am inclined to use all trials/mean as opposed to a single trial with the ‘best performance’. The data are jump landings so I don’t want to select a landing based on take off performance.

I appreciate that you say the difference between using all trials and a mean is negligible, however, if I were to use all trials, I assume the SPM function in python will accept this? Is there a limit on how many trials it will accept? I typically use R for all of my data analysis so I am not familiar with python or the SPM function yet. I believe in the videos of you/Stuart demonstrating in MatLab only a single trial was used, hence my questions. My intuition would be that the input data frame would contain a column for each unique trail so if I were to input five trials for two conditions it would be a data frame of ten columns.

Thanks again.

0todd0000 commented 2 years ago

The API changes slightly with different functions so whether or not things need to be adjusted depends on what function you use. All ANOVA-related functions accept extra trials without problems; there are no limits but there is no memory management so it may not be handle millions of billions of observations. t-tests and regression will not accept extra trials; extra observations are implicitly regarded as unique subjects if it is a multi-subject analysis, so one observation (e.g. a within-subject mean) must be used for each subject.

spm1d does not use data frames, it only accepts numpy arrays as the dependent variable inputs. Dependent variable arrays must be (J,Q) or (J,Q,I) where:

Thus five trials with two conditions would be either two (5,Q) arrays or one (10,Q) data array depending on the function to which you want to submit the data.

adammattiussi commented 2 years ago

Ok great. I still have a bit more data cleaning to do before I get to the SPM but this is a really helpful starting point.

Thanks again.