The interpretation of the results

0todd0000 / spm1dmatlab

One-Dimensional Statistical Parametric Mapping in Matlab.

GNU General Public License v3.0

28 stars 13 forks source link

The interpretation of the results #29

Closed PouyanMehryar closed 8 years ago

PouyanMehryar commented 8 years ago

Hi Todd,

I used the below code to investigate the difference between the two variables SPM1 = spm1d.stats.ttest2(yA, yB);
SPMi1 = SPM1.inference(0.05, 'two_tailed', true); disp(SPMi1)

The graph below shows the SPM and SPMi1 results of yA and yB.

1: I wonder if my interpretation is correct based on the attached plot: Yb was significantly greater than yA at 8-11% and yA was significantly different than yB at 95-98%. Therefore, the null hypothesis was rejected.

2: I want to know if the area above the mean (Zero) is related to yA and if the SPM[t] passes the t-critical then the yA is significantly different than the yB.

3: Is that determined based on which value comes first when they are placed in SPM1 = spm1d.stats.ttest2(yA, yB); ???

4: How could I make the red dotted line (t-critical) thicker??

test

Thank you in advance for your response.

0todd0000 commented 8 years ago

Hi, I'll answer each point below...

1: I wonder if my interpretation is correct based on the attached plot: Yb was significantly greater than yA at 8-11% and yA was significantly different than yB at 95-98%. Therefore, the null hypothesis was rejected.

That's correct. To expand on the reason for rejection the following might work: the null hypothesis was rejected because the t statistic field traversed the critical threshold at alpha=0.05.

2: I want to know if the area above the mean (Zero) is related to yA and if the SPM[t] passes the t-critical then the yA is significantly different than the yB.

The t statistic is computed basically as (meanA - meanB) / variance, so areas above zero imply (meanA > mean B), and areas below zero imply the opposite: (meanA < meanB). If SPM{t} crosses either threshold you can conclude significant difference.

3: Is that determined based on which value comes first when they are placed in SPM1 = spm1d.stats.ttest2(yA, yB); ???

Yes, the sign of the t statistic is the same as the sign of (meanA - meanB). Thus spm1d.stats.ttest2(yA, yB) and spm1d.stats.ttest2(yB, yA) will yield opposite SPM{t} curves.

4: How could I make the red dotted line (t-critical) thicker??

The easiest way is probably to use the handles returned by the "plot" function like this:

handles = SPMi1.plot();  % 1x5 graphics array (i.e. array of handles)
set(handles(3), 'linewidth', 5)  %upper critical threshold
set(handles(4), 'linewidth', 10)  %lower critical threshold

The "plot" returns handles to the main plotted elements including (in approximate order):

SPM{t}
test statistic field, the zero datum line
upper critical threshold
lower critical threshold, and
supra-threshold cluster patches.

The order of elements 1-3 won't change, but elements 4 and 5 will change depending on: (a) whether or not two-tailed inference is conducted, and (b) whether or not there are any supra-threshold clusters.

PouyanMehryar commented 8 years ago

Thank you for your detailed and comprehensive response.

I have some more follow up questions regarding my data.

I have motion capture data (Kinetics and Kinematics) from one subject who used different prosthetic devices (5 prostheses) at 3 different walking speeds. Now I would like to use an appropriate SPM statistical test to compare the moments produced by the different prostheses during different speeds.

1: I wonder if I should use a pair t-test for comparison between two prostheses worn by the one subject at a specific speed (e.g. normal)? 2: Should I use one-way ANOVA (1D) to compare all 5 prostheses in terms of knee moments for the same speed walking (e.g. normal) ? (perhaps use post-hoc tests afterwards)? 3: The subject walked at 3 speeds (slow, normal and fast) while wearing each one of these prostheses, should I use one-way ANOVA (1D) to see the effect of speed on e.g. moment around the knee for the particular prosthetic (e.g. C leg)?

Thank you in advance for your response

0todd0000 commented 8 years ago

Hello, thank you for your questions.

1: I wonder if I should use a pair t-test for comparison between two prostheses worn by the one subject at a specific speed (e.g. normal)?

That's fine if there's only one speed. With multiple speeds this should only be done in post hoc analysis.

2: Should I use one-way ANOVA (1D) to compare all 5 prostheses in terms of knee moments for the same speed walking (e.g. normal) ? (perhaps use post-hoc tests afterwards)?

A two-way repeated measures ANOVA (spm1d.stats.anova2rm) may be better, with factors: PROSTHESIS and SPEED.

3: The subject walked at 3 speeds (slow, normal and fast) while wearing each one of these prostheses, should I use one-way ANOVA (1D) to see the effect of speed on e.g. moment around the knee for the particular prosthetic (e.g. C leg)?

I'd suggest a two-way repeated measures ANOVA (as above).

Cheers,

Todd

0todd0000 commented 8 years ago

Hi Pouyan, This issue has been inactive for a few weeks so I'll close it for now, but please feel free to repost to this issue or create a new issue if anything comes up. Todd

PouyanMehryar commented 7 years ago

Hi Tod,

Thank you for incorporating non-parametric tests into SPM.

I performed normality test on my data and the result clearly shows it is not normal. normality_test

However, when I performed the non-parametric t-test and parametric t-test, the results were analogous. 1: How can one interpret this? 2: Since the results are the same can I continue using the parametric test (a two sample t-test)?

parametricvs nonparametric

0todd0000 commented 7 years ago

Hi,

1: How can one interpret this?

It looks like the normality test result is driven primarily by a single outlier (the black line which has a value of about 0.35 between time=40 and time=80). If you remove that observation I suspect that the normality test's X2 value will drop substantially during this time period. Presuming the X2 value drops, I'd interpret the results as having an outlier whose effects on parametric inference are not qualitatively large.

2: Since the results are the same can I continue using the parametric test (a two sample t-test)?

In general: non-parametric results are always valid, and parametric results are valid only when the residuals are normally distributed. Provided there are no qualitative differences between the parametric and non-parametric results I think it would be fine to report either as the main result, and to possibly report the other as a supplementary result. Or to answer your question more directly, I think it's fine to use the parametric result in this case provided you acknowledge the outlier and state that this departure from normality was not strong enough to render the parametric results invalid.

Todd

PouyanMehryar commented 7 years ago

Thank you for your prompt response.

The figures presented here are associated with the EMG data and this kind of variations frequently expected. I dont know if it should be removed or leave it as an expected outlier which occurs in stochastic data type such as EMG? what is your suggestion in this regard?

I have some generic questions related to the normality and parametric test that It is much appreciated if you may answer, please

1: I think depending on the type of analysis normality test could be applied separately or together to the datasets (in this case the Ya = 24x100 and Yb = 6x100)? Obviously, population b has a fewer trial, therefore, it is expected to not be normally distributed if one sample t-test is performed? however if it is combined with the other population to do two sample t-test then because the data set contains a higher number of trials then it could be normal.

1b: Could it happen that the data sets in a particular normality test (e.g. t-test) to be normal but in another not (e,g regression)?

2: Do we have the median difference in the case of non-parametric test or still mean difference is being used?

3: I have a data set from two populations but for each instead of EMG activity from different trials I have the average EMG from each participant (from each muscle ). let's say each group consists of 10 subjects could I perform a two sample t-test (Ya = 10x100 and Yb = 10x100) in order to find the difference between the two groups (for each muscle) or I need to have all trials from each muscle to account for the mean and variance??

Thank you in advance for your response!

0todd0000 commented 7 years ago

Hi, I'll answer below...

The figures presented here are associated with the EMG data and this kind of variations frequently expected. I dont know if it should be removed or leave it as an expected outlier which occurs in stochastic data type such as EMG? what is your suggestion in this regard?

If you think the data are accurate then I'd suggest keeping them. My suggestion above regarding removing the outlier pertained only to normality probing: if you remove the observation, then re-run the normality analysis, and the results show greater normality, then you will be able to conclude that the non-normality is caused by the outlier. However, the outlier should only be removed from the main analyses if there was something fundamentally wrong with the measurement.

1: I think depending on the type of analysis normality test could be applied separately or together to the datasets (in this case the Ya = 24x100 and Yb = 6x100)? Obviously, population b has a fewer trial, therefore, it is expected to not be normally distributed if one sample t-test is performed? however if it is combined with the other population to do two sample t-test then because the data set contains a higher number of trials then it could be normal.

It depends on your experimental question. If your hypothesis pertains to population normality then it would indeed make sense to look at Ya and Yb separately. A smaller sample size does not imply that it is more likely to be non-normal; normality tests control for sample size so that all experiments, irrespective of sample size, will reach significance with a probability of alpha when the data are, in fact, normally distributed.

In general I'd suggest that, unless you have a specific hypothesis regarding the population distribution, you should regard normality tests as complimentary information that may explain differences between parametric and non-parametric results.

1b: Could it happen that the data sets in a particular normality test (e.g. t-test) to be normal but in another not (e,g regression)?

Yes, when the residuals are different then the normality results are also generally different. However, please note that the type of test is not optional; the test follows directly from the experimental design. There shouldn't be different normality results for a given experimental dataset.

2: Do we have the median difference in the case of non-parametric test or still mean difference is being used?

The normality tests in spm1d are parametric, and only mean differences (or more precisely: the residuals) are relevant to those tests. Non-parametric normality / distribution tests usually rely on ranks, not on median differences.

3: I have a data set from two populations but for each instead of EMG activity from different trials I have the average EMG from each participant (from each muscle ). let's say each group consists of 10 subjects could I perform a two sample t-test (Ya = 10x100 and Yb = 10x100) in order to find the difference between the two groups (for each muscle) or I need to have all trials from each muscle to account for the mean and variance??

Using only subject means is equivalent to using all trials when the data are normal. Population inference is concerned only with effects at the inter-subject level, and thus only with inter-subject variability. Since inter-subject variability is generally much larger than intra-subject variability, the latter can usually be ignored.

Todd

PouyanMehryar commented 7 years ago

Hi Tod,

Thanks for your continuous support.

My question is about Anova2onerm. I have 2 groups: SUBJECTS: 24 A = (Healthy = 13 and Amputee = 11) B = HeaLthy (13 Slow, 13 Normal, 13 Fast) and Transfemoral (11 Slow, 11 Normal, 11 Fast)

Basically, the same subjects within each group performed the task at different speeds. I understand that since the number of healthy and amputee are not the same (unbalance data sets) an error was produced. I reduced the data from healthy and made the number of subjects equal to amputee and then it worked.

1) I wonder if Anova2onerm is the correct test to do? 2) How can one interpret the results below? test

SnPM{F} inference list (1D) design : ANOVA2onerm nEffects : 3 nPermUnique : 5.443449e+92 nPermActual : 500 Effects: A F = (1x100) array h0reject = 0 B F = (1x100) array h0reject = 1 AB F = (1x100) array h0reject = 1

3) I have done both non-parametric and parametric tests and the results are analogous. I wonder if there is a possibility to include all the subjects in healthy group thereby dealing with unbalanced data set.

4) Is it ok to mention that the data was reduced in order to perform the statistical test in publications?

0todd0000 commented 7 years ago

Hi!

I wonder if Anova2onerm is the correct test to do?

A. Yes, that sounds correct.

How can one interpret the results below?

A usual ANOVA interpretation is fine. Main effect of SPEED during early and late time. Interaction effect between GROUP and SPEED during early time.

I have done both non-parametric and parametric tests and the results are analogous. I wonder if there is a possibility to include all the subjects in healthy group thereby dealing with unbalanced data set.

One way to do it would be to re-run a number of times with different subsets of the healthy group. If the results are qualitatively similar each time it would strengthen the result.

Also, even if the parametric and non-parametric results I'd recommend more than 500 permutations. It's better to use many more permutations, and I'd recommend using 10,000. Otherwise the results you observe could be an artifact of those particular 500 permutations.

Is it ok to mention that the data was reduced in order to perform the statistical test in publications?

Yes, I don't expect that reviewers would have a problem with that. Mention that unbalanced designs are not yet supported in spm1d, and so you repeated the analyses a number of times with random subsets of one group (see Question 3 above) as a type of sensitivity analysis. Perhaps you could provide results of the sensitivity analysis in the form of: (a) a histogram of critical thresholds calculated with each random subset of the healthy group, and (b) histograms of maximum F values for A, B and AB. I'd expect that the B histogram, for example, would show that all maximum F values are greater than all critical thresholds.

PouyanMehryar commented 7 years ago

Thanks for your prompt and detailed answers. I have some follow up questions. The result of "A" shows there is no difference between the groups i.e the mean difference of Healthy and Amputees is not significant to reject the null hypothesis but the difference arises when factor speed is considered. My question is regarding "main A" and how this has been achieved. The mean of the data from HS (3 speeds x11 subjects = 33) was compared with the mean of the data from Amputees ((3 speeds x11 subjects = 33) and the result showed to be not statistically significant. However, when the speeds i.e. Main B (11 slow, 11 normal, 11 fast) were compared between the groups we can see the difference. Is this interpretation correct (Q1)?

Q2) I wonder if an interaction exists and is significant (as shown in above fig), is it important to evaluate the row (Speed) and column (Group) effects individually?

Q3) What kind of post-hoc test is appropriate to do in this situation to analyze the data further. e.g. if I want to use a post-hoc test (t-test) for Main B (SPEED) with corrected alpha level what would be the alpha in this case? Q3 a) Do I need to do 15 different combinations?

Q4: I have come across other cases where the Main A and Main B are not significant but the interaction is. What would be the interpretation of that?

Q5) I wonder if you are aware of any publications that SPM1D Anova2onerm or any other Anova tests were used so I could learn more about this test and its interpretation?

Thank you in advance for your response!

0todd0000 commented 7 years ago

Hi Pouyan,

Q1-Q4 can be answered as follows: when interactions exist it is generally not valid to consider main effects. Please refer to general two-way ANOVA sources like this discussion: https://www.researchgate.net/post/Insignificant_main_effects_but_significant_interaction

Q5: spm1d uses standard ANOVA procedures, and interpreting spm1d ANOVA results is identical to standard ANOVA with the minor exception that the test statistic is (one-dimensionally) continuous. So If you read textbooks describing two-way or M-way RM-ANOVA it should be clear. Another good set of sources is commercial software like SPSS or MATLAB or free packages like R, where lots of documentation exists. If their connection to spm1d is unclear then please let me know.

Todd

0todd0000 commented 6 years ago

Hi Pouyan, Thank you for creating a new issue (#69). I have another favor to ask: next time, please create separate issues for each software problem, like this:

First issue: Constant p values in non-parametric analysis
Second issue: Post hoc analysis: computing corrected p values

Please note that theses two issues are completely separate software issues, so it is better if they appear as different issues. The purpose of this forum is to solve software problems associated with spm1d, so it would be much easier for me if you can separate all software issues as clearly as possible like above. That way, if we solve the first issue we can close it, even if the second issue is not yet solved. When many issues appear together (like here, in issue #29), it is difficult to know which sub-issues have been solved and/or which still remain a problem. Also, when issues are separated it is much easier to refer others to previously solved issues.

Thank you very much for your contributions! Todd

m-a-robinson commented 4 years ago

Dear Mike,

I can’t seem to find this issue on GitHub to post a reply. If you can add the post below to the issues on the code site I’ll be able to respond. In short the critical threshold does seem unusually high for some reason.

Regards Mark

From: schwa021 notifications@github.com Sent: 05 March 2020 17:37 To: 0todd0000/spm1dmatlab spm1dmatlab@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [0todd0000/spm1dmatlab] The interpretation of the results (#29)

Hi Todd,

Thanks for providing this nice software and helpful manuals/background info./etc...

This is my first foray into using spm to analyze continuous (1d) curves, and I have a question about "general" behavior of the critical values for paired t-tests.

Below is an example [2020-03-05_11-26-26]https://hes32-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fuser%2dimages.githubusercontent.com%2f25011558%2f76007897%2d66438480%2d5ed4%2d11ea%2d840e%2d9c36f7a8bbdc.png&umid=fd727069-6dbe-495c-8cb1-bad1c49ac873&auth=768f192bba830b801fed4f40fb360f4d1374fa7c-824118ccc5420778fc919597e6c7738e6efca263

The following matlab code indicates the stats I was running

spm = spm1d.stats.ttest_paired(X1, X2);

spmi = spm.inference(.05, 'two_tailed', true, 'interp', true);

Which generated:

SPM{t} inference z: [1×101 double] df: [1 2] fwhm: 6.3796 resels: [1 15.6749] alpha: 0.0500 zstar: 44.9278 h0reject: 0 p_set: 1 p: []

Looking at the paired pre-post EMG data, I would have "guessed" that there were significant differences near 75% of the gait cycle (maybe elsewhere too). However, the data was not even close to the critical value.

If we re-scale the right hand plot, we can see that the statistic is "pretty big" (I actually have no basis of reference for gauging how big/small it is - but it seems to be a z-score, so 10 seems big to me). [image]https://hes32-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fuser%2dimages.githubusercontent.com%2f25011558%2f76008355%2d229d4a80%2d5ed5%2d11ea%2d8638%2da24a03b19f6c.png&umid=fd727069-6dbe-495c-8cb1-bad1c49ac873&auth=768f192bba830b801fed4f40fb360f4d1374fa7c-92baf6fe76dc7204aaf4f03b00e38d224b77dd63

I'm wondering about how the critical value (zstar I think it's called) tends to scale with degrees of freedom. In this example, there are only three trials for each condition. I am guessing that, despite the wide gap (even as a z-score), there is something going on with having such a low dof that makes the apparent difference insignificant.

If this is the case - do you have any generic advice about the N required for convergence of this statistical approach?

I appreciate any thoughts you have or advice you can give.

-Mike-

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://hes32-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2f0todd0000%2fspm1dmatlab%2fissues%2f29%3femail%5fsource%3dnotifications%26email%5ftoken%3dAB6JZBHNRQIWYFMNQCAX37TRF7PNBA5CNFSM4CSIXED2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN6GKAA%23issuecomment%2d595354880&umid=fd727069-6dbe-495c-8cb1-bad1c49ac873&auth=768f192bba830b801fed4f40fb360f4d1374fa7c-ffa9d4134ab7d3ca2c32a32ee2310162d748ac77, or unsubscribehttps://hes32-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fnotifications%2funsubscribe%2dauth%2fAB6JZBDXDEDQCZSCYAC3RG3RF7PNBANCNFSM4CSIXEDQ&umid=fd727069-6dbe-495c-8cb1-bad1c49ac873&auth=768f192bba830b801fed4f40fb360f4d1374fa7c-514ba159bdebd750680a4ff0e7dd94d67447cb8a.

Important Notice: the information in this email and any attachments is for the sole use of the intended recipient(s). If you are not an intended recipient, or a person responsible for delivering it to an intended recipient, you should delete it from your system immediately without disclosing its contents elsewhere and advise the sender by returning the email or by telephoning a number contained in the body of the email. No responsibility is accepted for loss or damage arising from viruses or changes made to this message after it was sent. The views contained in this email are those of the author and not necessarily those of Liverpool John Moores University.