Closed BryanGuillaume closed 5 years ago
I can confirm that the results for Deactivation in case 1 above are not correct as there should be 6 clusters with p=0.05 and not 3 like displayed. Nevertheless, the display result for Activation in case 1 seems correct. I fear that the cluster threshold used for Activation was also used for Deactivation
I can also see that the value clus k (5% FWE): 58 is the same for Activation and Deactivation (in case 1 and case 2). Using the infor in SwE.WB.clusterInfo, I can compute:
In addition, I feel that the notation used for this (i.e. clus k(5% FWE): 58) might be ambiguous as a cluster of 58 has actually a p-value of 0.1 and not 0.05.
Sorry for the late reply. This is not good. Can I establish: Is this just an error associated with the 'on-the-fly' switching of contrasts in the SPM Results interface? Or is it more fundamental?
Or has this been addressed in the new release and so can be closed?
I think the majority of the bugs reported here have been actually fixed in PR #125. I left it open because I felt that I did not do enough testing. The latter is rather time-consuming because, as you mentioned it, some bugs appear only 'on-the-fly' after switching contrasts in a particular order. I will take some time today to a more thorough testing to see if all these bugs have been fixed.
After some testing, it seems that the 2 following bugs have been fixed
If I start by selecting "Deactivation", I got an empty result
Indeed, now, when I start by Deactivation, the results are not empty and seems correct.
the value clus k (5% FWE): 58 is the same for Activation and Deactivation
Indeed, switching from deactivation to activation and vice versa, the estimate changes. Though I still believe that this value is ambiguous. Indeed, for example, for the activation, I have clus k (5% FWE): 45
. In reality, we have P(clus k > 45) = 5% and P(clus k >= 45) = 5.1% due to the way the WB p-value are computed. Thus a cluster of size 45 has a p-value of 5.1%, not 5%. Thus that is kind of ambiguous. On the same note, the k value used for the threshold is currently set as the size of the smallest cluster that survives the thresholding. It is therefore computed differently from the value clus k (5% FWE)
. I am a bit unsure what would be the appropriate value to use here.
Here below the results showing the points above.
The bug seems to be linked to the variable thresDesc
which is saved in xSwE
. It is first set by the user on the UI at the beginning of the result display, which is fine. However, the code is overwriting its value later. Thus, when we change the contrast, xSwE.thresDesc
is not correct.
Here below, some relevant piece of code:
try
thresDesc = xSwE.thresDesc;
catch
if strcmp(orig_infType, 'clus')
% For WB we have FWE or p/k specification if the user
% originally ran a clusterwise analysis.
str = 'FWE|none';
thresDesc = spm_input('extent p value adjustment to control','+1','b',str,[],1);
else
% If the user originally ran voxelwise or TFCE we have no
% FWE clusterwise map from the bootstrap.
thresDesc = 'none';
end
end
switch thresDesc
case 'FWE' % Family-wise false positive rate
....
pu = SwE.WB.clusterInfo.primaryThreshold;
thresDesc = ['p<' num2str(pu) ' (unc.)'];
Note that this happens for other WB options in the code. I will try to change this and see if this fix the bug.
Digging further in the code, it appears that the sub-function mychgcon
in swe_results_ui.m
is actually supposed to appropriately convert back the string stored in xSwE.thresDesc
when the contrast is changed. It fails to do so correctly for cluster-wise WB analyses, explaining the bug above. I have added to it the following piece of code:
% for cluster-wise WB, the code above does not work. We need to overwrite this
if xSwE.WB == 1 && strcmp(xSwE.clustWise, 'FWE')
xSwE2.thresDesc = 'FWE';
xSwE2.u = NaN;
xSwE2.k = NaN;
end
That seems to fix the bug noted above as it can be seen in the screenshot below:
However, I feel that there might be other bugs when selecting some other options because, like it has been observed with this bug, swe_results_ui
may not always send the correct information to swe_getSPM
. I feel there is a need to check all options one by one to ensure that this is not the case.
Wow... OK... thanks for chasing all of this. But one thing disturbs me:
... I have clus k (5% FWE): 45. In reality, we have P(clus k > 45) = 5% and P(clus k >= 45) = 5.1% due to the way the WB p-value are computed.
P-values have a well-established and unique definition: The probability of observing a value as or more extreme as actually observed. Hence P(clus k
= 45) is the correct calculation.
About the footer... in SPM, just b/c it was easier, the cluster critical threshold is indeed the smallest non-significant cluster size. This is silly. We have the null distribution and of course should this from the permutation distribution.
Let me know if this makes sense.
I think I agree with what you are saying. Then we should change the way clus k (5% FWE)
is computed by selecting the next cluster size in the max distribution.
Currently, we have
if Ic == 1
maxClusterSize = sort(SwE.WB.clusterInfo.maxClusterSize);
elseif Ic == 2
maxClusterSize = sort(SwE.WB.clusterInfo.maxClusterSizeNeg);
else
error("Unknown contrast");
end
xSwE.Pfc = maxClusterSize(ceil(0.95*(xSwE.nB+1))); % Clusterwise FWE P
Should we have something like this instead ?
xSwE.Pfc = maxClusterSize(1 + ceil(0.95*(xSwE.nB+1))); % Clusterwise FWE P
To show the effect of this, let us assume 3 values for nB = 98, 99 & 100 and denote c_i and c_j the cluster size selected. We have:
nB | i = ceil(0.95*(xSwE.nB+1)) | P(k >= c_i) | j = ceil(1 + 0.95*(xSwE.nB+1)) | P(k >= c_j) |
---|---|---|---|---|
98 | 95 | 5/99 = 0.0505 | 96 | 4/99 = 0.0404 |
99 | 95 | 6/100 = 0.06 | 96 | 5/100 = 0.05 |
100 | 96 | 6/101 = 0.0594 | 97 | 5/101 = 0.0495 |
Note that we can retrieve exactly 5% only when (nB + 1) is a multiple of 20. Thus, what would be the best?
Another issue seems to occur if we want to change the way the cluster critical threshold is computed. It should be the smallest k in the max distribution with P < 0.05. In such a case the formula maxClusterSize(1 + ceil(0.95*(xSwE.nB+1)))
seem to work for nB = 98 or 100, but is not correct for nB = 99 because we have exactly 0.05.... I would need to compute it as j = ceil( (mod(nB+1, 20) == 0) + 1 + 0.95*(xSwE.nB+1))
What do you think about this?
I'm very confused... with Andrew Holmes, I thought very long and hard about how to get the right critical threshold from a distribution, and we always arrived at the same thing:
idx = ceil((1-alph)*nPerm);
sMax = sort(MaxPerm);
Thr = sMax(idx);
With this approach, all test statistics above threshold will have p-values...
x = % a statistic value with x>=Thr
Pfwe = sum(MaxPerm>=x)/nPerm;
that are less than or equal to alpha.
Are you not finding this?
It should be the smallest k in the max distribution with P < 0.05
I disagree; it should be the smallest statistic with P <= 0.05.
How does this change your calculations?
Running the following code
nPerm = 100;
alph = 0.05;
MaxPerm = 1:100; % for the sake of example
idx = ceil((1-alph)*nPerm);
sMax = sort(MaxPerm);
Thr = sMax(idx);
%With this approach, all test statistics above threshold will have p-values...
x = Thr; % a statistic value with x>=Thr
Pfwe = sum(MaxPerm>=x)/nPerm
x = Thr+0.1; % a statistic value with x>=Thr
Pfwe = sum(MaxPerm>=x)/nPerm
gives
Pfwe =
0.0600
Pfwe =
0.0500
So for x = Thr, we do not get a P-value <= 0.05 (it is 0.06...). This only occur for value x > Thr.
Regarding,
It should be the smallest k in the max distribution with P < 0.05
I disagree; it should be the smallest statistic with P <= 0.05.
I am always confused about this. On the SPM result page below the image, it is written k = 65 (p < 0.05 (FWE)). That seems contradictory. Also, swe_getSPM
is systematically discarding p_value equal to the threshold fwep_c
Q = find(ps_fwe<fwep_c);
I am not sure that this is the behaviour wanted. Overall, I am simply confused with all these values for discrete distributions because the = is very meaningful.
Do you have the Holmes et al. JCBFM paper (1996??)? Or, look in Andrew Holmes thesis and see if you can find some clarity. --
Thomas Nichols, PhD Professor of Neuroimaging Statistics Nuffield Department of Population Health | University of Oxford Big Data Institute | Li Ka Shing Centre for Health Information and Discovery Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom T: +44 1865 743590 | E: thomas.nichols@bdi.ox.ac.uk W: http://nisox.org | http://www.bdi.ox.ac.uk
Here some text from Nichols and Holmes (2001):
"The 95th percentile is 462, so any suprathreshold clusters with size greater than 462 voxels can be declared significant at level 0.05"
"With 924 permutations, the 95th percentile is at 924 x 0.05 = 46.2, so the critical threshold is the 47th largest member of the permutation distribution. Any voxel with intensity greater than this threshold can be declared significant at the 0.05 level"
So the threshold (Thr) is done with the 95th percentile like it is currently implemented, but it is exclusive i.e. any value c > Thr is surviving the thresholding, but not a cluster of size Thr. I absolutely agree with this way of thresholding and this is consistent with the way I have been thresholding WB p-values in the past. However, that is in contradiction with the way the toolbox is currently thresholding. Indeed, it thresholds using the p-values as
ps_fwe < 0.05
which will exclude the clusters with ps_fwe = 0.05 even if their cluster size are greater than Thr. Thus, I believe we should include them as well using ps_fwe <= 0.05
. That would make more sense to me now. Do you agree with this change?
Regarding the stats on the display, I still believe that
clus k (5% FWE) = Th
Extent threshold k = x voxels (p < 0.05 FWE)
are confusing. The first use the 95th percentile Th (i.e. the exclusive threshold) and the second one is using the size of the smallest cluster surviving the thresholding (i.e. an inclusive threshold), which, by the way, may have a p = 0.05 (so in contradiction with the text p < 0.05 FWE). I would try to write this down differently to avoid confusion.
I am embarrassed to have forgotten these key details... indeed, in SnPM, thresholds are found with >= but applied with >.
For example: P-value computation (however, note that instead of sum(Perm>=x), it is implemented as sum(Perm>x-tol) to get expected p-values when x is identical to one element of Perm upto numerical precision) https://github.com/SnPM-toolbox/SnPM-devel/blob/master/snpm_pp.m#L502
And thresholding (finding voxels that survive the threshold) https://github.com/SnPM-toolbox/SnPM-devel/blob/master/snpm_pp.m#L1028
So... please accept my apologies for my insistence.
Now, on the footer... I totally agree that "size of the smallest cluster surviving the thresholding" is weird, and we should simply list the threshold... right?
No need to apologise. The most important is to make sure things are correctly done in the toolbox.
sum(Perm>x-tol)
instead of sum(Perm>=x)
makes sense due to numerical precision. I will go through the SwE code to check if this is implemented this way.
I have also inspected the cluster-wise thresholding in SnPM and it is done using the cluster sizes:
https://github.com/SnPM-toolbox/SnPM-devel/blob/master/snpm_pp.m#L1010-L1017
not using the FWER p-value like in SwE. To make things consistent, I think it would be better to mimic SnPM and do the thresholding on the cluster size as well. I will edit the code in this way.
For the footer, I agree that having only the threshold will be indeed less confusing.
Super... we're on the same page.
Wait! Something just occurred to me! I don't see how the >=
for p-values but >
for thresholding can be correct.
Consider the trivial case of 6 observations in a two-sample t-test, 2 groups of 3. There are 6-choose-3 = 20 permutations, and say that the correctly labeled statistic, say 4.2, is the largest. So the p-value is #{T*>=4.2}/20=0.05
but you're saying we won't reject the null because 4.2 is not strictly greater than the 95%ile 4.2?
More information:
The 'bible' for FWE permutation, the Westfall & Young (1993) book, defines (FWE) adjusted p-values with >=
; however, W&Y is never clear on whether the original statistic is included in the permutation distribution.
I looked at Andrew Holmes 1996 paper, and he does a weird two step:
He defines the threshold to be the c+1-th largest element of the ordered (max) permutation distribution, for c=floor(alpha nPerm); so for the 20 perm case, c=1, and then we're instructed to use the c+1-th largest, i.e. 2nd largest element of the (max) perm distribution but* use strict inequality when thresholding.
So... it works... somewhere along the line we (or just I) got rid of the c+1
concept but I don't know that I then turned the >
's into >=
's.
Let me know what you think of all of this.
I think the c+1 concept is equivalent to selecting the c = ceil( (1-alpha ) nPerm) smallest score like it is currently done in SnPM. The code below seems to demonstrate this
nPerm = 1000;
alpha = 0.05;
score_ascend = 1:nPerm;
score_descend = sort(score_ascend, 2, 'descend');
c=floor(alpha * nPerm);
c_star=ceil((1-alpha) * nPerm);
[score_descend(c + 1), score_ascend(c_star)]
This provides the same threshold which is used to select only values strictly superior to it.
Also, in your nPerm = 20 example, the 95th percentile is the largest score after 4.2 (c=2 or c_star = 19). Thus, unless this score is also 4.2, the null will be rejected for it. In the scenario where this score is also 4.2, the null will be accepted, but the p-value will be 0.1. So that is consistent.
A different question would be how to do the threshold on p-values. I would say that we should have an inclusive threshold in order to be consistent i.e. reject the null for all p-value <= 0.05. Currently, in the toolbox, it is an exclusive threshold, which I disagree with. Holmes et al. (1996) seems to agree with this inclusive threshold:
For a test at level a, this adjusted p value image is thresholded at level a, declaring voxels with adjusted p values less than or equal to a as activated
Also, regarding the notation in the result display, I would adopt the notation like in Holmes et al. (1996) i.e. something like P(c > k) <= alpha. Currently, it is showing something different.
OK... so I get it. P-values computed on the basis of >=, and can be thresholded on the basis of >=, but statistic values must be thresholded with an exclusive threshold.
Got it now.
Yes, go for the changes you suggest.
I believe everything related to this issue has been fixed in PR #125 and PR #131.
Working on the issue #103 (related to a post-hoc masking issue), I have noted that the display of results for clusterwise WB does not behave appropriately.
Indeed, after running the results module form the batch system,
If I start by selecting "Activation", the result display seems correct. Then if I decide to switch to change the contrast to "Deactivation", I also get some display that looks ok.
If I start by selecting "Deactivation", I got an empty result. Then if I decide to switch to change the contrast to "Activation", I also get an empty result.
These 2 set of result displays are contradictory. Also, we can note that the cluster threshold in case 1 is the same for deactivation and activation, which is suspicious. As the behaviour in case 2 is definitively wrong and the behaviour seems suspicious, I feel that there is a need to review all the code for the clusterwise WB display to ensure that it behaves appropriately.