More numerous data or more reliable data for MVPA ?

JeanneCaronGuyon commented 2 years ago

Hi all,

It's me again ! So big question on an old topic I guess... Should we prefer more data to more reliable data for MVPA ?

Let me explain: we have been choosing to go with 1 beta per trial to train our MVPA algorithm and test them. Little reminder: for the VisuoTact project we get 7 trials / condition / run, we have 6 runs in total (so 42 trials / condition in total), and use a leave-two-out procedure so we have 4 training runs and 2 testing runs in each split. That's great because it gives us "many" trials to train on. However, 1 beta per trial is probably super noisy.

Now, if we take 1 beta / run (per condition), it leads us to 7 times less data, but each of these "averaged" betas should be more reliable, robust, less noisy. But would that lead to a less accurate definition of the hyperplane that will separate our conditions ?

Below is an image taken from Martin Herbart class

I can see the potential differences, but what would be your wild guess on which to choose ? Why historically did we decide to go for one beta per trial instead of keeping our old good one beta per run GLM ? And maybe more specifically in the context of our study with 6 runs, and 7 trials / condition / run ?

Thanks for your input ! Jeanne

SylvainTakerkart commented 2 years ago

Hi, the RMN reminds me that I've never replied... Sorry!

The title you chose for this issue actually summarizes perfectly the dilemna, for which, of course, there's no answer, and as usual, the best solution is usually a compromise between the number of data points and their quality! Your post only forgets one thing, the fact that early on, lots of MVPA studies actually used single TR BOLD images as inputs... So the fact that lots of people are using single-trial beta maps actually represents already a good solution for the looked-after compromise between 1. the single TR BOLD images, and 2. at the other extreme, the "one beta per run" solution.

Also, another element: don't forget that on principle, a classifier (or more generally a machine learning model) will be good when we have a good estimate of the distribution of the underlying data (the separator that is used to classify is only a characteristic driven by the distributions of the two classes)... So even if the algorithm used (to estimate the separator / classifier) does not explicitly try to directly estimate the underlying distributions of the two classes, a proper intuition is that "the more data points the better" and that "estimating the characteristics of the noise , i.e the distribution around the mean, is probably good for you"... [but, as you say, if the data is too noisy, it might be a mess ;) ]

Finally, the transition from "single TR image" to "single-trial beta maps" has another advantage: it gets rid of the strong correlation in the input data if you use "single TR images" as inputs! Since for training ML models, having independant data points is of great importance, the "single-trial beta maps" is THE solution that give you the most numerous data points while ensuring (approximately) the independance of your observations (and on top of this, you get rid of a good amount of noise that's present in "single TR images".

SylvainTakerkart commented 2 years ago

In a pragmatic manner, if you want to experiment with this looked-after compromise, this could be done this way:

start by estimating "single-trial betas"
do your decoding (e.g in a well chosen ROI), get your decoding performance, this is your baseline perf!
average N of your single-trial betas together (averaging of course only trials of the same class / condition); for a given value of N, the decoding will give you a performance level in the same ROI
you can start with N=2, see whether the decoding performance has increased
if yes, you can increase N as you wish, evaluating the decoding performance every time
the maximum possible value of N will correspond to averaging all trials of each condition in a run, which corresponds (quasi-perfectly in the mathematical sense) to having originally estimated a GLM with one beta per run per condition (since it's a linear model, estimating one beta per run per condition will give you almost exactly the same as averaging single-trial betas: you're measuring the "average" effect size for this run for this condition)

Centre-IRM-INT / GT-MVPA-nilearn

More numerous data or more reliable data for MVPA ? #23