Mensen / ept_TFCE-matlab

Advanced EEG Statistics
27 stars 9 forks source link

Loading New Data #3

Open speters14 opened 7 years ago

speters14 commented 7 years ago

Just a basic question about loading data into the tool. Rather than using .dat files, I'd prefer to use my preprocessed .mat files which are organized by subject X channel X spectral power bin. So for example 7 x 124 x 251. What's is the best way to import this data format into the toolbox? using the GUI? (ie. which button could I use, or which variable would I manually assign the data to?).

Also, I am not using ERP's, as this is sleep data (spontaneous EEG). Does the tool require ERP trials? my data is already grouped by age and gender (these are the different factors for which I would like to apply TFCE).

Thanks!

Mensen commented 7 years ago

The data can be used directly if already loaded into the matlab workspace...

Results = ept_TFCE(...
    group_1_data, ... % subject by channel by frequency bin
    group_2_data, ... % also a subject by channel by frequency bin
    channel_locations, ... % if EEGLAB is used is simply EEG.chanlocs from any one participant
    'rsample', EEG.srate, ... % if using frequency this should be the frequency bin size basically
    'nPerm', 5000, ... % anything about 2000 probably provides consistent p-values
    'E_H', [0.66, 2], ... % set the parameters differently if you like
    'type', 'd', ... % if your groups are independent use 'i', otherwise repeated measure use 'd' 
    'saveName', 'tfce_results_file.mat');

The above code is for a single factor comparison (basically T-tests). The data can be arranged slightly differently to use the ept_TFCE_ANOVA for multiple factors if you want to explore age and gender (and their interaction at the same time). Let me know if that is of interest.

speters14 commented 7 years ago

Thanks! I was looking in the wrong files, for this this text. Once I am able to get this running on my data, with a single factor comparison first, I will be interested in the ANOVA for multiple factors and interaction.

On Mon, Oct 10, 2016 at 5:40 PM, Armand Mensen notifications@github.com wrote:

The data can be used directly if already loaded into the matlab workspace...

ept_TFCE(... group_1_data, ... % subject by channel by frequency bin group_2_data, ... % also a subject by channel by frequency bin channel_locations, ... % if EEGLAB is used is simply EEG.chanlocs from any one participant 'rsample', EEG.srate, ... % if using frequency this should be the frequency bin size basically 'nPerm', 5000, ... % anything about 2000 probably provides consistent p-values 'E_H', [0.66, 2], ... % set the parameters differently if you like 'type', 'd', ... % if your groups are independent use 'i', otherwise repeated measure use 'd' 'saveName', 'tfce_results_file.mat');

The above code is for a single factor comparison (basically T-tests). The data can be arranged slightly differently to use the ept_TFCE_ANOVA for multiple factors if you want to explore age and gender (and their interaction at the same time). Let me know if that is of interest.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Mensen/ept_TFCE-matlab/issues/3#issuecomment-252753581, or mute the thread https://github.com/notifications/unsubscribe-auth/AHH7xo6fNkYCvFFbKRqsJTOsz6G-2mReks5qyrDNgaJpZM4KTAf9 .

Sue Peters PhD Candidate

Infancy Studies Laboratory Facebook: RUBabyLab https://www.facebook.com/RUbabylab/ Center for Molecular and Behavioral Neuroscience Facebook: CMBN https://www.facebook.com/cmbnrutgers/ Rutgers University - Newark sp@suepeters.com mobile: 646-337-7025

www.linkedin.com/in/suepeters

Mensen commented 7 years ago

Good luck!

I'm also working on a general linear model version so that you can include any number of factors and levels in your analysis, but the permutation strategies for these sorts of things can be a bit tricky so they are not out yet. But I'm sure we can discuss options once you get there.

speters14 commented 7 years ago

Thanks Mensen! Since my data is spontaneous EEG (sleep) and not ERP's, I'm not sure that the analyses are running correctly. The results viewing tool appears to give me values across time (but these are actually my frequency bins). Time is not a component in my data. I pasted a screenshot below. So in this case S12 would actually be bin 12, which corresponds with 2.75 Hz. Also, I'm wondering about the y-axis values in the xy plot with freq bins on the x axis. I'm reading your thesis =) in hopes to get the best understanding. image

speters14 commented 7 years ago

The groups above are two different ages. I'd like to add gender as a factor. I'll take a look at the TFCE_ANOVA file to see if I can understand how to segment the data for that analyses.

speters14 commented 7 years ago

It's also unclear to me how to interpret this results plot, by channel, again given there is no time dimension in my data. image

Mensen commented 7 years ago

The scripts treat the second dimension as time by default, and if you put in the sampling rate as your frequency bin size then it could have been translated directly into frequencies (you can still do that by manually editing the results file, or when re-running the analysis put in the rSample option as your bin size. Its then only a little annoying to see the units as milliseconds, but the numbers will all line up correctly.

The y-axis is the negative log of the p-values (as stated in the plot title). In this way, the lower the p-value, the higher then negative log so you can easily see at which time point / frequency bin your largest effects are.

For the ERP comparison, if you add the frequency sampling information, the x-axis would be in Hz (although it would still say "time" unfortunately since there is no inherent way to detect whether your data is time or frequency).

Mensen commented 7 years ago

A few further comments and suggestions...


How are you calculating the frequency bins? If bin "12" is equal to 2.75Hz it sounds like you are using quite long time windows within your FFT. This is a separate issue of course, but using long time segments with continuous EEG activity will generally push the boundaries of the signal stationarity assumption for the FFT. For example, when I transform sleep data into the frequency domain I generally will use overlapping windows of 1-2s (pwelch method)


At the moment the anova version of TFCE only accepts equal group numbers. Do you have equal groups of male and female recordings at each age? I'll assume you don't since that would be a big coincidence unless gender was specifically of interest to effect your recruitment strategy. So we can discuss other ways to include the effect of gender (or any other factor) in your data. For example, I'm working on specific scripts to run linear mixed models at each channel/time/frequency level so that I can include 5/6 factors and model them precisely as I like given data structure and hypothesis.


Your data appears to be flat after a certain frequency. I expect this is because you filtered after that frequency. In that case, you should definitely not analyse that data in a statistical manner since its only adding noise to the overall signal. So next time you run the analysis, only select the frequency bins that you actually have data for. The same seems to be true for the lower frequencies, so you are also probably including bins in the lower end that you actually filtered out during pre-processing.

speters14 commented 7 years ago

Thanks so much, Armand. This is all quite helpful. Please see my responses below.

On Tue, Oct 18, 2016 at 4:53 AM, Armand Mensen notifications@github.com wrote:

A few further comments and suggestions...

How are you calculating the frequency bins? If bin "12" is equal to 2.75Hz it sounds like you are using quite long time windows within your FFT. This is a separate issue of course, but using long time segments with continuous EEG activity will generally push the boundaries of the signal stationarity assumption for the FFT. For example, when I transform sleep data into the frequency domain I generally will use overlapping windows of 1-2s (pwelch method)

SP: The sleep segments are scored at 20 second intervals, the pwelch method is applied to these NREM 2/3 sleep segments, with a 4s Hanning window (no overlap), as seems to be commonly used in the Kurth/Hubert etc developmental dEEG sleep work. Do you think this is an issue?


At the moment the anova version of TFCE only accepts equal group numbers. Do you have equal groups of male and female recordings at each age? I'll assume you don't since that would be a big coincidence unless gender was specifically of interest to effect your recruitment strategy. So we can discuss other ways to include the effect of gender (or any other factor) in your data. For example, I'm working on specific scripts to run linear mixed models at each channel/time/frequency level so that I can include 5/6 factors and model them precisely as I like given data structure and hypothesis.

SP: You are right, I do not have equal group sizes for gender, although I am still collecting data, so it could be possible. It would be great to have a linear mixed model tool.

Your data appears to be flat after a certain frequency. I expect this is because you filtered after that frequency. In that case, you should definitely not analyse that data in a statistical manner since its only adding noise to the overall signal. So next time you run the analysis, only select the frequency bins that you actually have data for. The same seems to be true for the lower frequencies, so you are also probably including bins in the lower end that you actually filtered out during pre-processing.

SP: That is correct, and thanks for mentioning it. Yesterday I had considered that I should not include the bins that were filtered, but it wasn't clear to me how it would impact the permutation and cluster enhancement methods. I will rerun, as you suggest.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Mensen/ept_TFCE-matlab/issues/3#issuecomment-254446797, or mute the thread https://github.com/notifications/unsubscribe-auth/AHH7xkqOKU6NDBQP2pAOflXfARwutTvnks5q1Ij4gaJpZM4KTAf9 .

Sue Peters PhD Candidate

Infancy Studies Laboratory Facebook: RUBabyLab https://www.facebook.com/RUbabylab/ Center for Molecular and Behavioral Neuroscience Facebook: CMBN https://www.facebook.com/cmbnrutgers/ Rutgers University - Newark sp@suepeters.com mobile: 646-337-7025

www.linkedin.com/in/suepeters

speters14 commented 7 years ago

This is very helpful. Thank you. As an aside, I have found your thesis document to be a really helpful learning tool an guide to the software. Maybe you could add a link to it, prominently, on GitHub for those needing a little more depth and background than the Neuroimage paper. I'm not sure where I downloaded it from, but I think maybe from your personal website.

On Tue, Oct 18, 2016 at 4:36 AM, Armand Mensen notifications@github.com wrote:

The scripts treat the second dimension as time by default, and if you put in the sampling rate as your frequency bin size then it could have been translated directly into frequencies (you can still do that by manually editing the results file, or when re-running the analysis put in the rSample option as your bin size. Its then only a little annoying to see the units as milliseconds, but the numbers will all line up correctly.

The y-axis is the negative log of the p-values (as stated in the plot title). In this way, the lower the p-value, the higher then negative log so you can easily see at which time point / frequency bin your largest effects are.

For the ERP comparison, if you add the frequency sampling information, the x-axis would be in Hz (although it would still say "time" unfortunately since there is no inherent way to detect whether your data is time or frequency).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Mensen/ept_TFCE-matlab/issues/3#issuecomment-254442957, or mute the thread https://github.com/notifications/unsubscribe-auth/AHH7xhVr3CLnYuYfypMbWN07zALYlxw6ks5q1IT9gaJpZM4KTAf9 .

Sue Peters PhD Candidate

Infancy Studies Laboratory Facebook: RUBabyLab https://www.facebook.com/RUbabylab/ Center for Molecular and Behavioral Neuroscience Facebook: CMBN https://www.facebook.com/cmbnrutgers/ Rutgers University - Newark sp@suepeters.com mobile: 646-337-7025

www.linkedin.com/in/suepeters

speters14 commented 7 years ago

Two additional comments on the data that I shared. My power values were normalized (by bin, to the average of all channels for each individual). Do you have insight on whether this method is better applied with absolute or normalized data? I had planned to try both. For the most part, the power topoplots of the individuals don't change much when normalized, except a bit for the very low frequencies. Also, the data had been subject a .1hz high pass filter and a 35 Hz low pass filter.

Thanks!

On Tue, Oct 18, 2016 at 4:46 PM, Sue Peters sp@suepeters.com wrote:

This is very helpful. Thank you. As an aside, I have found your thesis document to be a really helpful learning tool an guide to the software. Maybe you could add a link to it, prominently, on GitHub for those needing a little more depth and background than the Neuroimage paper. I'm not sure where I downloaded it from, but I think maybe from your personal website.

On Tue, Oct 18, 2016 at 4:36 AM, Armand Mensen notifications@github.com wrote:

The scripts treat the second dimension as time by default, and if you put in the sampling rate as your frequency bin size then it could have been translated directly into frequencies (you can still do that by manually editing the results file, or when re-running the analysis put in the rSample option as your bin size. Its then only a little annoying to see the units as milliseconds, but the numbers will all line up correctly.

The y-axis is the negative log of the p-values (as stated in the plot title). In this way, the lower the p-value, the higher then negative log so you can easily see at which time point / frequency bin your largest effects are.

For the ERP comparison, if you add the frequency sampling information, the x-axis would be in Hz (although it would still say "time" unfortunately since there is no inherent way to detect whether your data is time or frequency).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Mensen/ept_TFCE-matlab/issues/3#issuecomment-254442957, or mute the thread https://github.com/notifications/unsubscribe-auth/AHH7xhVr3CLnYuYfypMbWN07zALYlxw6ks5q1IT9gaJpZM4KTAf9 .

Sue Peters PhD Candidate

Infancy Studies Laboratory Facebook: RUBabyLab https://www.facebook.com/RUbabylab/ Center for Molecular and Behavioral Neuroscience Facebook: CMBN https://www.facebook.com/cmbnrutgers/ Rutgers University - Newark sp@suepeters.com mobile: 646-337-7025

www.linkedin.com/in/suepeters

Sue Peters PhD Candidate

Infancy Studies Laboratory Facebook: RUBabyLab https://www.facebook.com/RUbabylab/ Center for Molecular and Behavioral Neuroscience Facebook: CMBN https://www.facebook.com/cmbnrutgers/ Rutgers University - Newark sp@suepeters.com mobile: 646-337-7025

www.linkedin.com/in/suepeters

Mensen commented 7 years ago

4 second windows actually seems fairly reasonable. This will give a frequency resolution of 0.25Hz, at the expense of some stationarity assumption in your signal. So I suppose I'm curious as to why you need to be so "precise" with your frequency bins? More curious is to why you don't use an overlap but do use the Hanning window? This means that about half your data is attenuated, with no overlap in windows to compensate for that loss. I doubt this produces huge variability in your data, and I'm more curious as to the rationale behind the choices being made. / quotation marks on precise because with the lack of stationarity you may not be so precise after all.


Re: linear mixed models: I'm now in the middle of running my first "true" linear mixed model with TFCE correction and hopefully the results make sense and I can share the code. It is however, painfully slow and will need to be optimized before it will really count as a good alternative.


Re: normalization. Personally I'm not a fan of this sort of normalization for two reasons (but you can talk to Brady who can give it a reasonable defense). The first is that it can become hard to interpret exactly where the changes have occurred. Say for example that all your differences in one group are highly focal and frontal. Normalization over all channels would create a drop in power in all other channels where there was none, and will attenuate your true focal/frontal effect. Secondly (and related), for the statistics, this sort of normalization if your effect is local will artificially create a large cluster of effects everywhere else that might inflate any cluster statistics you use. Now TFCE should help balance the effect size and strength accordingly, but I can't be completely sure that this artificial bias is entirely accounted for.

Good luck and let me know how things turn out with the simple comparisons and hopefully I can help with the more complex design as well.

AnnikaBeebe commented 1 year ago

I am also having trouble loading data into EEGLab. Once I can finally load the file after a lot of searching and manipulating the code. The data is not being processed properly and even if I try to run one filter in the Data it will flash error messages. Is there any way you can help?