Mensen / ept_TFCE-matlab

Advanced EEG Statistics
27 stars 9 forks source link

Loading fieldtrip data #7

Open Em-Fl opened 6 years ago

Em-Fl commented 6 years ago

Hi! I was wondering in which format should data be loaded. I have 23 subjects with a 2x2 factor design. As I'm a fieldtrip user I end up using cell arrays but I get that this is not the required input. Also, I should provide a file with the electrode location information as eeglab does, can you let me know how is this format so I can generate it from fieldtrips layouts?

Thank you for this code!

Mensen commented 6 years ago

Hi!

So if you have your data in a 2x2 cell array already then that can be used as input directly. You may need to reshape the matrix inside each cell array though so that its _participants x channels x timepoints matrix.

I'm not sure what kind of structure fieldtrip uses but I imagine its quite similar. In either case, if you have the channel locations file then you can convert this to eeglab format using e_loc = readlocs('file_name'); and it should figure out the file format on its own and properly convert it.

Let me know if you run into any issues... I'm sure we can sort them out.

Em-Fl commented 6 years ago

Hi Mensen, thank you for your response.

I've managed to shape my data to fit the script requirements. I'm still having problems with the channel location file. Creating a channel location file from eeglab I get a struct with the following fields labels theta radius X Y Z sph_theta sph_phi sph_radius sph_theta_besa sph_phi_besa

No e_loc field present..So I end up getting the following error in line 195 of ept_TFCE.m

Reference to non-existent field 'e_loc'.

Error in ept_TFCE (line 195) e_loc = e_loc.e_loc;

I am not a eeglab user so probably I'm missing something but the standard .loc files that come with eeglab (eeglab/sample_locs) are also lacking this e_loc field...maybe I'm just not getting something very basic but I would appreciate if you could clarify this for me.

Em-Fl commented 6 years ago

So, apparently I managed to get de elec location structure working and I'm able to run the ept_TFCE without problem. I'm still trying to figure out how to interpret the results but I guess I'll end up getting around it.

As I mentioned I have a 2x2 design so I'm interested in doing 2x2 repeated measures ANOVA. Can you explain how should the input data structure be in order to run ept_rmANOVA ?

Thank you!

Mensen commented 6 years ago

Let me know if you have any questions regarding the interpretation.

It seems you figured out the electrode locations but in case... All those fields you mentioned are fields of the electrode locations structure you need. So that whole variable (whatever its called, but I usually name this e_loc), is the input.

To run the ANOVA you want to use the ept_TFCE_ANOVA script... if you put 'r' into the type field then it will run the ept_rmANOVA later within the script. The input to ept_TFCE_ANOVA should be the 2x2 cell array with your data.

Mensen commented 6 years ago

I see you @Em-Fl made a comment on this in my notifications but unfortunately I can't see the comment here. Could you repost? Something must have gone wrong.

Em-Fl commented 6 years ago

Mensen! hi! thank you I was able to run the analysis without problems.

I see on your thesis that there is a feature in the result viewer to plot cluster results selectively but is missing in the version I've downloaded (I'm using the more recent version)... is there a script in one of the dependencies to do this? If there is I'm having trouble finding it.

I'm also wondering about the cluster results, how to interpret them when you end up having a number of clusters, some of them of size 1(one time point) and others with multiple channels and multiple time points involved. Also, running the 2x2 anova I end up with an interaction cluster that appears to be continuous in time..

Mensen commented 6 years ago

There is another function in the toolbox to give you a table of each of the connected, and significant clusters of results:

[cluster_results] = ept_calculateClusters(Results, channel_neighbours, threshold);

There you can use the Results structure generated by the TFCE analysis. The channel neighbourhood which should be in the Info of your analysis (or just generated again with channel_neighbours = ept_ChN2(Info.Electrodes); and the specific p-value threshold you are looking for clusters at (typically threshold = 0.05, but sometimes you want to find others so its optional).

The cluster results will let you describe the overall shape of the significant channels found. This would be a typical example for what I would write...

"Analysis showed that Condition A had significantly higher amplitudes compared to Condition B for 56 unique channels in the frontal region for the time range from 280 - 320 ms (peak channel: Fz; T = 6.024, p = 0.002)."

All of the above info can be obtained from the table of cluster_results generated above.


In regards to the interaction effect: Do you mean that there is a constant value over time? Or just that there is a signficant value (but fluctuating)?

Its certainly an odd result but could be reproduced by a systematic error in that added a constant value to only one of the conditions. This could be a mistake in the preprocessing, or even just a bad baseline correction applied to a single condition. Let me know if you can figure out what the effect exactly looks like (you can try to visualise the ERPs in the ResultViewer). Otherwise I'm happy to have a look into the data myself and try to figure out what happened.

Em-Fl commented 6 years ago

Yes, I had found the ept_calculateClusters script but I was wondering about the plotting tool, is there a script for that in one of the dependencies? Also, when I run this script I get the following information

channel_peak
sample_peak
max_t_value
p_value_peak
cluster_size
unique_channels
unique_samples
sample_range

Is there a way to get the specific channels involved in each cluster? I'm guessing there is, as they appear marked in the topoplot,. I would need that information in a way I could use it to graph each cluster distribution in space for different times.

As for the cluster results interpretation, I end up having 252 clusters, some of which are probably noise because they have a size of 1, the rest of them vary in size so I'm not clear as to what size should be the cluster in order to be relevant (in spite of being significant)..

For the 2x2 anova analysis I get a significant cluster for the entire temporal range of my data, I'm going to give it a look to see if I find an error. I'm sure it's not a baseline error (I'm running the analysis for the time points posterior to the baseline) or a problem in a preprocessing stage as it's data I've already analysed. Maybe I've made a mix up when arranging the cell arrays.

Mensen commented 6 years ago

In the latest update 8d8f446b039b6014fb93cb3452d1862d06a89caa I added the actual cluster locations in time/space in the resulting structure so you can find the information you are looking for there. Let me know if you create some interesting graphs of this output that would potentially be useful for other users and I'll add it to the toolbox (or even better send a pull request with the tool).

252 clusters sounds like too much. What is the size of your data? With my ERP datasets I usually get between 1-10 (at most). EEG just has too much correlation between neighbouring time points and channels not to naturally form a lot of clusters of results. For there to be 252 independent clusters of significant time points there would have to be a huge amount of time points or sparsely connected channels. Your intuition is probably right about many of them being noise... but in principle these should not be found as significant if they are only in a couple of individuals etc... especially because the TFCE methodology punishes data which does not have the support of its neighbouring time points and channels. So its still a strange result to have.

Why run the analysis on only the time points after the baseline? Its good to be able to show there are no significant differences found during the baseline to support the rest of the significant results. Moreover, correcting for baseline data, could still influence the results (systematic increase of amplitude for all post-baseline time points if there was an artefact in the baseline that changed the mean)... so this remains a possibility despite you only examining the post-baseline time points.

Em-Fl commented 6 years ago

My data size is 23 (subjects) x 64 (electrodes) x 512 (time points). I ended up working with data after baseline because I was using a script I made myself to do the permutation analysis and is really slow, so the less time points the better. But yes, is a good idea to have the bl period.

I just ran the analysis incorporating the baseline and the output is multiple clusters (>200), even at the baseline window...something clearly is not right. If I look at the pvalues matrix I see lots of values repeated which seems odd..

Mensen commented 6 years ago

Indeed. Would you mind sending me the results file and I can have a look? My email is research.mensen (at) gmail ... also which options did you use?

The p_values are directly determined by the number of permutations directly and so repeated p_values aren't such a surprise.

Em-Fl commented 6 years ago

Mensen, finally it was a mistake with the data structure, it's working now! Thank you !

Mensen commented 6 years ago

Great to hear you found the problem... what was the problem in the structure? Perhaps I can implement a check for these sorts of things within the toolbox so it can warn others of potentially common errors.

Good luck with the analysis and interpretation!

Em-Fl commented 6 years ago

It was a reshape error, the data had the right dimensiones but all the rows where mixed up, dumb mistake.

Mensen, I was wondering about doing a follow up analysis. For example, let's say I have a 2x2 design, and find an interaction between those two factors. If I want to do a follow up test for each factor, how should I do this? I was thinking of taking the time window in which I find the interaction and the channels that take part in the cluster and do a dependent sample t test. Is it possible to do this with your software? Thank you in advance!

Mensen commented 6 years ago

Hi!

So, if you wanted to take a region/time_window of interest, then you could just mean the results and run a series of normal T-test in matlab with the results since this is just a single group (see the function ttest / ttest2 in matlab).

However, if you still wanted to keep the channels and time points separately and run a TFCE approach for the levels of your factors alone, then you could just find those channels and windows (presumably from the output of ept_calculateClusters) and then just use those indices on the input to the ept_TFCE function... like this for example:

ept_TFCE ( data1 ( : , channels_of_interest , time_samples_of_interest ) , ...
    data2  ( : , channels_of_interest , time_samples_of_interest ) , ...
    elocs ( channels_of_interest), ...

with all the other parameters you ran your last analysis with (sampling_rate etc).

Good luck!

JD-Zhu commented 6 years ago

Sorry for reviving this thread - I have a related question about interpretation.

I ran ept_TFCE on some ERF data and used ept_calculateClusters to find out which channels and time points formed the clusters. A few clusters showed up, but they are all very small: mostly comprising 1-2 channels, spanning 1-3 adjacent samples (my sampling rate was 200Hz, so this would mean the effect lasts 5~10ms). There is even a cluster with 1 channel and 1 sample. Does this sound weird to you? Might I have done something wrong?

Many thanks, Judy

Mensen commented 6 years ago

Hi Judy! Very small clusters of results are a little unusual but in principle possible with the TFCE approach that still produces p-value for each channel / time-point / frequency-bin separately. Its more unusual just because with EEG data, you tend to get some natural smoothing (volume conduction, filtering etc) which makes these smaller clusters certainly less likely to occur.

I'd have a look at the overall shape of the results. Are the neighbouring points around these significant clusters near significant, or do these points seem to come out of nowhere? Are they in line with some of your experimental hypotheses, or do they seem randomly distributed? Are they consistent across all participants or just driven by a couple of more outlying data-points?

Hope that helps a bit!

JD-Zhu commented 6 years ago

Thanks Armand! I've checked the overall shape of the results. The significant clusters certainly don't come out of nowhere - the neighbouring points are usually marginally significant (p < 0.1).

The timing of the clusters are in line with my hypotheses. It's just that the short duration of the effects (5~10ms) really seem strange. Am I supposed to set my own "threshold" regarding what I do or do not consider a cluster (e.g. minimum cluster size of 3)? Or should I report all clusters found (even if it only contains 1 channel and 1 time point)? When the neighbouring points are marginally significant, is there a way I can take that into account to form larger clusters?

Many thanks, Judy

On 28 May 2018 at 19:26, Armand Mensen notifications@github.com wrote:

Hi Judy! Very small clusters of results are a little unusual but in principle possible with the TFCE approach that still produces p-value for each channel / time-point / frequency-bin separately. Its more unusual just because with EEG data, you tend to get some natural smoothing (volume conduction, filtering etc) which makes these smaller clusters certainly less likely to occur.

I'd have a look at the overall shape of the results. Are the neighbouring points around these significant clusters near significant, or do these points seem to come out of nowhere? Are they in line with some of your experimental hypotheses, or do they seem randomly distributed? Are they consistent across all participants or just driven by a couple of more outlying data-points?

Hope that helps a bit!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://protect-au.mimecast.com/s/3mEoC5QP8ySNvPOCzW7nn?domain=github.com, or mute the thread https://protect-au.mimecast.com/s/rmenC6XQ68fwD6xC6TBAO?domain=github.com .

Mensen commented 6 years ago

Taking the neighbouring points into account is precisely what the TFCE method already incorporates into its approach... so it seems quite likely that those small clusters are only significant because of the support they get from their neighbours (even if not significant themselves).

The threshold-free approach eliminates any arbitrary cut-offs for the results. Actually each and every channel and time point has its own precise p-value that can be reported. The use of the ept_calculate_clusters part is just a way to organise the reporting of your results. How and what you then report is of course completely up to you. I think the idea of reporting all your results is scientifically rigorous and open... so I'd be in favour of that... but I also am hesitant to advise any more than that.

Good luck!

JD-Zhu commented 6 years ago

Ah that makes sense. Thanks so much for the explanation! On 29 May 2018 18:44, "Armand Mensen" notifications@github.com wrote:

Taking the neighbouring points into account is precisely what the TFCE method already incorporates into its approach... so it seems quite likely that those small clusters are only significant because of the support they get from their neighbours (even if not significant themselves).

The threshold-free approach eliminates any arbitrary cut-offs for the results. Actually each and every channel and time point has its own precise p-value that can be reported. The use of the ept_calculate_clusters part is just a way to organise the reporting of your results. How and what you then report is of course completely up to you. I think the idea of reporting all your results is scientifically rigorous and open... so I'd be in favour of that... but I also am hesitant to advise any more than that.

Good luck!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://protect-au.mimecast.com/s/W32uCnx1Z5UNQK9F9URzo?domain=github.com, or mute the thread https://protect-au.mimecast.com/s/lF7cCoV1Y2S3JB2fzDiT3?domain=github.com .