cvnlab / GLMsingle

A toolbox for accurate single-trial estimates in fMRI time-series data
BSD 3-Clause "New" or "Revised" License
98 stars 42 forks source link

What is the additional elements added as the predictors in the output? #111

Closed MtKana closed 1 year ago

MtKana commented 1 year ago

Hi, I am trying to understand your package detail with the example1.ipynb.

There, the design matrix for each run has a shape of (300, 583). So at this point, the NSD dataset has 583 predictors for the GLM analysis.

After all the analysis (turning on all of the options), there seem to be now 750 predictors instead of 583. The shape of the TYPED_FITHRF_GLMDENOISE_RR.npy is (145, 186, 1, 750).

What is the additional 167 predictors that is added during the analysis step of your algorithm?

I would appreciate any kind of teaching/assistant. Thank you.

kendrickkay commented 1 year ago

The 583 columns correspond to 583 distinct experimental conditions. Some of these conditions are presented more than once (multiple trials). The 750 outputs that you get are single-trial beta weights, one beta weight for each unique trial. Since there are some conditions with multiple trials, the number of outputs you get are more than 583.

MtKana commented 1 year ago

Thank you for your teaching. Really appreciate it. However I still remain confused.

I see in example3_BIDS.ipynb that design matrix has a shape of (153, 5). This makes sense because this dataset had 5 different experimental conditions (i.e. 'ambient','country','metal','rocknroll','symphonic'). (this was an auditory perception task) In the example3, after running the glmsingle with a design matrix of (153, 5), I get back 50 different unique trials. But is it possible for there to be 583 different experimental condition? I'm not very familiar with fMRI studies yet. Forgive me if I seem to be asking something stupid.

"Some of these conditions are presented more than once (multiple trials). The 750 outputs that you get are single-trial beta weights, one beta weight for each unique trial. Since there are some conditions with multiple trials, the number of outputs you get are more than 583."

For each timepoints (run volume), there is a certain experimental condition assigned to it. For example at run volume 6 it was "symphonic" and at run volume 34 it was symphonic again. If I am understanding you correctly, would that mean that these single timepoints (assigned with a certain experimental condition) would be treated as a separate unique trial (although it's both symphonic)? Then I must say, there would a very large amount of it, much more than 50. In fact there should be the same amount with the amount of timepoints because every timepoints have different condition assigned (the conditions are not continuos but rapidly changes).

I would greatly appreciate of any teaching here of what the increments in the number of unique trials mean. Thank you.

Edit: And I'm also wondering why I can see a lot of beta weight in a non-brain space. And also if the function can also work with a single run data input (so a list with length 1). When I try it, I get an error saying "ValueError: need at least one array to concatenate" after Type A and Type B output is generated. image ` 112 testids = stimix[testix] 114 # vector of trial indices in the training data --> 115 traincols = np.concatenate([validcolumns[x] for x in trainix]) 117 # vector of condition-ids in the training data 118 trainids = np.concatenate([stimix[x] for x in trainix])

File <__array_function__ internals>:200, in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate`

kendrickkay commented 1 year ago

Yes, 583 seems large for the number of unique experimental conditions. But it is correct. That particular experiment (NSD) was unusual in that respect.

We use the term 'trial' to refer to each instance of a condition. The assumption is that from the experimenter's perspective, the multiple trials should be essentially identical (with any differences being attributable to "noise").

Note that "trial" refers to the whole evolution of the BOLD response over time --- and does not refer to a single time point. Note that the BOLD response needs at least 30-50 seconds to resolve and reach baseline (a very long time).

Beta estimates exist in non-brain areas because... that's the nature of noise. The analysis does not make any distinction between voxels within the brain and outside the brain, and there are time-series data in all voxels, so you will get a beta estimate even for voxels that we know have no actual signal in them.

Note that you need multiple runs (more than 1) to make use of the GLMdenoise and RR components of GLMsingle. If you do not have more than 1 run, you can turn off those features.

MtKana commented 12 months ago

Thank you very much for your helpful explanation and fast reply!! I could finally understood.

Just one more thing. Is there a way to know the average beta weight for a certain condition? For example, if the output (the "betasmd") of the GLMsingle is an array of (70, 70, 40, 168), this means that there was 168 unique trials. But each design matrix had a shape of (590, 5) meaning there was 5 condition as a whole. (Say, A to E) I want to know the beta estimate for each conditions "A" "B" "C" "D" "E".

kendrickkay commented 12 months ago

Yes. It is the resposibility of the user to compute the means of the betas accordingly. Basically, you find all the trials that are for "A" and then index them out, and then average the betas for those trials. GLMsingle doesn't do this for you.

MtKana commented 11 months ago

I see.

Thank you again!! It helped me very much.