LIMO-EEG-Toolbox / limo_tools

Hierarchical Linear Modelling for MEEG data
https://limo-eeg-toolbox.github.io/limo_meeg/
Other
58 stars 27 forks source link

Split regression and normalization #97

Closed NirOfir closed 2 years ago

NirOfir commented 2 years ago

Hi, much more a question than an issue:

In std_limo(), specifically limo_split_continuous() which is called in line 429, the continuous predictor is split according to the categorical predictor, and then normalized. I wondered why that is the order of operations. For instance, say I have a target detection experiment, and I want to predict the EEG based on the decision of the subject ("target"/"no target") the target intensity (continuous with let's say 10 levels that were tested in the experiment) and their interaction (so that target intensity might not have the same effect depending on the report of the subject). There is a strong correlation between subject's report and target intensity: For weak targets, subject will almost always report they found no target, and for strong targets they will almost always report finding the target. When I normalize the intensity separately by subject report, the same target intensity will be coded by different numbers: A mid-level target (one in which subjects say they found the target in 50% of the trials) will get a positive value in the "no target" trials (because it's stronger than most targets that subjects didn't find), but a negative value in the "target" trials (because it's weaker than most targets that subjects did find). What is the advantage of normalizing after splitting? I feel like it might make the interpretation of the model coefficients more confusing. I also tried finding normalizing after splitting generally for GLMs, but didn't find any helpful info.

Thanks!

EDIT: OK, I think this can actually be rephrased as the common question of "should we standardize before or after including the interaction term?". What got me confused is the fact that LIMO always has a "full" coding of categorical predictors, when typically people talk about some contrast coding (only k-1 betas for a predictor with k levels, instead of the full k LIMO uses). So, LIMO first adds the interaction, then standardizes. I was indoctrinated to go the other way around :)

CPernet commented 2 years ago

if requested, we split per condition - nothing to do with interaction per se, simply not assuming a continuous variable behaves the same across all conditions ; this is not an interaction term in itself (simply go from XXXXXXXXX to XXX000000 000XXX000 000000XXX for 3 categories for instance) but yes using contrast you can compute the equivalent of the interaction eg an F test [0 0 0 -1 0 1 ; 0 0 0 0 -1 1] ; the advantage a full rank is that you can also test all sort of other effects, for instance [0 0 0 1 1 1] like if it was a single regressor