NISOx-BDI / SwE-toolbox

SwE toolbox
GNU General Public License v2.0
16 stars 7 forks source link

The display of clusterwise WB results does not seem correct #122

Closed BryanGuillaume closed 5 years ago

BryanGuillaume commented 5 years ago

Working on the issue #103 (related to a post-hoc masking issue), I have noted that the display of results for clusterwise WB does not behave appropriately.

Indeed, after running the results module form the batch system,

  1. If I start by selecting "Activation", the result display seems correct. Then if I decide to switch to change the contrast to "Deactivation", I also get some display that looks ok.

    Screenshot 2019-06-10 at 15 08 54 Screenshot 2019-06-10 at 15 09 07
  2. If I start by selecting "Deactivation", I got an empty result. Then if I decide to switch to change the contrast to "Activation", I also get an empty result.

Screenshot 2019-06-10 at 15 19 09 Screenshot 2019-06-10 at 15 09 34

These 2 set of result displays are contradictory. Also, we can note that the cluster threshold in case 1 is the same for deactivation and activation, which is suspicious. As the behaviour in case 2 is definitively wrong and the behaviour seems suspicious, I feel that there is a need to review all the code for the clusterwise WB display to ensure that it behaves appropriately.

BryanGuillaume commented 5 years ago

I can confirm that the results for Deactivation in case 1 above are not correct as there should be 6 clusters with p=0.05 and not 3 like displayed. Nevertheless, the display result for Activation in case 1 seems correct. I fear that the cluster threshold used for Activation was also used for Deactivation

BryanGuillaume commented 5 years ago

I can also see that the value clus k (5% FWE): 58 is the same for Activation and Deactivation (in case 1 and case 2). Using the infor in SwE.WB.clusterInfo, I can compute:

In addition, I feel that the notation used for this (i.e. clus k(5% FWE): 58) might be ambiguous as a cluster of 58 has actually a p-value of 0.1 and not 0.05.

nicholst commented 5 years ago

Sorry for the late reply. This is not good. Can I establish: Is this just an error associated with the 'on-the-fly' switching of contrasts in the SPM Results interface? Or is it more fundamental?

Or has this been addressed in the new release and so can be closed?

BryanGuillaume commented 5 years ago

I think the majority of the bugs reported here have been actually fixed in PR #125. I left it open because I felt that I did not do enough testing. The latter is rather time-consuming because, as you mentioned it, some bugs appear only 'on-the-fly' after switching contrasts in a particular order. I will take some time today to a more thorough testing to see if all these bugs have been fixed.

BryanGuillaume commented 5 years ago

After some testing, it seems that the 2 following bugs have been fixed

Indeed, now, when I start by Deactivation, the results are not empty and seems correct.

Indeed, switching from deactivation to activation and vice versa, the estimate changes. Though I still believe that this value is ambiguous. Indeed, for example, for the activation, I have clus k (5% FWE): 45. In reality, we have P(clus k > 45) = 5% and P(clus k >= 45) = 5.1% due to the way the WB p-value are computed. Thus a cluster of size 45 has a p-value of 5.1%, not 5%. Thus that is kind of ambiguous. On the same note, the k value used for the threshold is currently set as the size of the smallest cluster that survives the thresholding. It is therefore computed differently from the value clus k (5% FWE). I am a bit unsure what would be the appropriate value to use here.

Here below the results showing the points above.

Starting With Activation and showing Activation

startAct_act

Starting With Activation, but switching to Deactivation

startAct_deact

Starting With Deactivation and showing Deactivation

startDeact_deact

Starting With Deactivation, but switching to Activation

startDeact_act
BryanGuillaume commented 5 years ago

The bug seems to be linked to the variable thresDesc which is saved in xSwE. It is first set by the user on the UI at the beginning of the result display, which is fine. However, the code is overwriting its value later. Thus, when we change the contrast, xSwE.thresDesc is not correct.

Here below, some relevant piece of code:

          try
              thresDesc = xSwE.thresDesc;
          catch
              if strcmp(orig_infType, 'clus')
                % For WB we have FWE or p/k specification if the user
                % originally ran a clusterwise analysis.
                str = 'FWE|none';
                thresDesc = spm_input('extent p value adjustment to control','+1','b',str,[],1);
              else
                % If the user originally ran voxelwise or TFCE we have no
                % FWE clusterwise map from the bootstrap.
                thresDesc = 'none';
              end
          end

          switch thresDesc

              case 'FWE' % Family-wise false positive rate

               ....

                  pu = SwE.WB.clusterInfo.primaryThreshold;
                  thresDesc = ['p<' num2str(pu) ' (unc.)'];          

Note that this happens for other WB options in the code. I will try to change this and see if this fix the bug.

BryanGuillaume commented 5 years ago

Digging further in the code, it appears that the sub-function mychgcon in swe_results_ui.m is actually supposed to appropriately convert back the string stored in xSwE.thresDesc when the contrast is changed. It fails to do so correctly for cluster-wise WB analyses, explaining the bug above. I have added to it the following piece of code:

    % for cluster-wise WB, the code above does not work. We need to overwrite this
    if xSwE.WB == 1 && strcmp(xSwE.clustWise, 'FWE')
      xSwE2.thresDesc = 'FWE';
      xSwE2.u = NaN;
      xSwE2.k = NaN;
    end

That seems to fix the bug noted above as it can be seen in the screenshot below:

Starting With Activation and showing Activation

startAct_act2

Starting With Activation, but switching to Deactivation

startAct_deact2

Starting With Deactivation and showing Deactivation

startDeact_deact2

Starting With Deactivation, but switching to Activation

startDeact_act2

However, I feel that there might be other bugs when selecting some other options because, like it has been observed with this bug, swe_results_ui may not always send the correct information to swe_getSPM. I feel there is a need to check all options one by one to ensure that this is not the case.

nicholst commented 5 years ago

Wow... OK... thanks for chasing all of this. But one thing disturbs me:

... I have clus k (5% FWE): 45. In reality, we have P(clus k > 45) = 5% and P(clus k >= 45) = 5.1% due to the way the WB p-value are computed.

P-values have a well-established and unique definition: The probability of observing a value as or more extreme as actually observed. Hence P(clus k

= 45) is the correct calculation.

About the footer... in SPM, just b/c it was easier, the cluster critical threshold is indeed the smallest non-significant cluster size. This is silly. We have the null distribution and of course should this from the permutation distribution.

Let me know if this makes sense.

BryanGuillaume commented 5 years ago

I think I agree with what you are saying. Then we should change the way clus k (5% FWE) is computed by selecting the next cluster size in the max distribution.

Currently, we have

       if Ic == 1
        maxClusterSize = sort(SwE.WB.clusterInfo.maxClusterSize);
      elseif Ic == 2
        maxClusterSize = sort(SwE.WB.clusterInfo.maxClusterSizeNeg);
      else
        error("Unknown contrast");
      end
      xSwE.Pfc = maxClusterSize(ceil(0.95*(xSwE.nB+1))); % Clusterwise FWE P 

Should we have something like this instead ? xSwE.Pfc = maxClusterSize(1 + ceil(0.95*(xSwE.nB+1))); % Clusterwise FWE P

To show the effect of this, let us assume 3 values for nB = 98, 99 & 100 and denote c_i and c_j the cluster size selected. We have:

nB i = ceil(0.95*(xSwE.nB+1)) P(k >= c_i) j = ceil(1 + 0.95*(xSwE.nB+1)) P(k >= c_j)
98 95 5/99 = 0.0505 96 4/99 = 0.0404
99 95 6/100 = 0.06 96 5/100 = 0.05
100 96 6/101 = 0.0594 97 5/101 = 0.0495

Note that we can retrieve exactly 5% only when (nB + 1) is a multiple of 20. Thus, what would be the best?

Another issue seems to occur if we want to change the way the cluster critical threshold is computed. It should be the smallest k in the max distribution with P < 0.05. In such a case the formula maxClusterSize(1 + ceil(0.95*(xSwE.nB+1))) seem to work for nB = 98 or 100, but is not correct for nB = 99 because we have exactly 0.05.... I would need to compute it as j = ceil( (mod(nB+1, 20) == 0) + 1 + 0.95*(xSwE.nB+1))

What do you think about this?

nicholst commented 5 years ago

I'm very confused... with Andrew Holmes, I thought very long and hard about how to get the right critical threshold from a distribution, and we always arrived at the same thing:

idx = ceil((1-alph)*nPerm);
sMax = sort(MaxPerm);
Thr = sMax(idx);

With this approach, all test statistics above threshold will have p-values...

x = % a statistic value with x>=Thr
Pfwe = sum(MaxPerm>=x)/nPerm;

that are less than or equal to alpha.

Are you not finding this?

It should be the smallest k in the max distribution with P < 0.05

I disagree; it should be the smallest statistic with P <= 0.05.

How does this change your calculations?

BryanGuillaume commented 5 years ago

Running the following code

nPerm = 100;
alph = 0.05;
MaxPerm = 1:100; % for the sake of example

idx = ceil((1-alph)*nPerm);
sMax = sort(MaxPerm);
Thr = sMax(idx);

%With this approach, all test statistics above threshold will have p-values...

x = Thr; % a statistic value with x>=Thr
Pfwe = sum(MaxPerm>=x)/nPerm

x = Thr+0.1; % a statistic value with x>=Thr
Pfwe = sum(MaxPerm>=x)/nPerm

gives

Pfwe =

    0.0600

Pfwe =

    0.0500

So for x = Thr, we do not get a P-value <= 0.05 (it is 0.06...). This only occur for value x > Thr.

Regarding,

It should be the smallest k in the max distribution with P < 0.05

I disagree; it should be the smallest statistic with P <= 0.05.

I am always confused about this. On the SPM result page below the image, it is written k = 65 (p < 0.05 (FWE)). That seems contradictory. Also, swe_getSPM is systematically discarding p_value equal to the threshold fwep_c

Q = find(ps_fwe<fwep_c);

I am not sure that this is the behaviour wanted. Overall, I am simply confused with all these values for discrete distributions because the = is very meaningful.

nicholst commented 5 years ago

Do you have the Holmes et al. JCBFM paper (1996??)? Or, look in Andrew Holmes thesis and see if you can find some clarity. --


Thomas Nichols, PhD Professor of Neuroimaging Statistics Nuffield Department of Population Health | University of Oxford Big Data Institute | Li Ka Shing Centre for Health Information and Discovery Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom T: +44 1865 743590 | E: thomas.nichols@bdi.ox.ac.uk W: http://nisox.org | http://www.bdi.ox.ac.uk

BryanGuillaume commented 5 years ago

Here some text from Nichols and Holmes (2001):

"The 95th percentile is 462, so any suprathreshold clusters with size greater than 462 voxels can be declared significant at level 0.05"

"With 924 permutations, the 95th percentile is at 924 x􏰅 0.05 =􏰆 46.2, so the critical threshold is the 47th largest member of the permutation distribution. Any voxel with intensity greater than this threshold can be declared significant at the 0.05 level"

So the threshold (Thr) is done with the 95th percentile like it is currently implemented, but it is exclusive i.e. any value c > Thr is surviving the thresholding, but not a cluster of size Thr. I absolutely agree with this way of thresholding and this is consistent with the way I have been thresholding WB p-values in the past. However, that is in contradiction with the way the toolbox is currently thresholding. Indeed, it thresholds using the p-values as

ps_fwe < 0.05

which will exclude the clusters with ps_fwe = 0.05 even if their cluster size are greater than Thr. Thus, I believe we should include them as well using ps_fwe <= 0.05. That would make more sense to me now. Do you agree with this change?

Regarding the stats on the display, I still believe that

are confusing. The first use the 95th percentile Th (i.e. the exclusive threshold) and the second one is using the size of the smallest cluster surviving the thresholding (i.e. an inclusive threshold), which, by the way, may have a p = 0.05 (so in contradiction with the text p < 0.05 FWE). I would try to write this down differently to avoid confusion.

nicholst commented 5 years ago

I am embarrassed to have forgotten these key details... indeed, in SnPM, thresholds are found with >= but applied with >.

For example: P-value computation (however, note that instead of sum(Perm>=x), it is implemented as sum(Perm>x-tol) to get expected p-values when x is identical to one element of Perm upto numerical precision) https://github.com/SnPM-toolbox/SnPM-devel/blob/master/snpm_pp.m#L502

And thresholding (finding voxels that survive the threshold) https://github.com/SnPM-toolbox/SnPM-devel/blob/master/snpm_pp.m#L1028

So... please accept my apologies for my insistence.

Now, on the footer... I totally agree that "size of the smallest cluster surviving the thresholding" is weird, and we should simply list the threshold... right?

BryanGuillaume commented 5 years ago

No need to apologise. The most important is to make sure things are correctly done in the toolbox.

sum(Perm>x-tol) instead of sum(Perm>=x) makes sense due to numerical precision. I will go through the SwE code to check if this is implemented this way.

I have also inspected the cluster-wise thresholding in SnPM and it is done using the cluster sizes:

https://github.com/SnPM-toolbox/SnPM-devel/blob/master/snpm_pp.m#L1010-L1017

not using the FWER p-value like in SwE. To make things consistent, I think it would be better to mimic SnPM and do the thresholding on the cluster size as well. I will edit the code in this way.

For the footer, I agree that having only the threshold will be indeed less confusing.

nicholst commented 5 years ago

Super... we're on the same page.

nicholst commented 5 years ago

Wait! Something just occurred to me! I don't see how the >= for p-values but > for thresholding can be correct.

Consider the trivial case of 6 observations in a two-sample t-test, 2 groups of 3. There are 6-choose-3 = 20 permutations, and say that the correctly labeled statistic, say 4.2, is the largest. So the p-value is #{T*>=4.2}/20=0.05 but you're saying we won't reject the null because 4.2 is not strictly greater than the 95%ile 4.2?

nicholst commented 5 years ago

More information:

The 'bible' for FWE permutation, the Westfall & Young (1993) book, defines (FWE) adjusted p-values with >=; however, W&Y is never clear on whether the original statistic is included in the permutation distribution.

I looked at Andrew Holmes 1996 paper, and he does a weird two step:

He defines the threshold to be the c+1-th largest element of the ordered (max) permutation distribution, for c=floor(alpha nPerm); so for the 20 perm case, c=1, and then we're instructed to use the c+1-th largest, i.e. 2nd largest element of the (max) perm distribution but* use strict inequality when thresholding.

So... it works... somewhere along the line we (or just I) got rid of the c+1 concept but I don't know that I then turned the >'s into >='s.

Let me know what you think of all of this.

BryanGuillaume commented 5 years ago

I think the c+1 concept is equivalent to selecting the c = ceil( (1-alpha ) nPerm) smallest score like it is currently done in SnPM. The code below seems to demonstrate this

nPerm = 1000;
alpha = 0.05;

score_ascend = 1:nPerm;
score_descend = sort(score_ascend, 2, 'descend');

c=floor(alpha * nPerm);
c_star=ceil((1-alpha) * nPerm);

[score_descend(c + 1), score_ascend(c_star)]

This provides the same threshold which is used to select only values strictly superior to it.

Also, in your nPerm = 20 example, the 95th percentile is the largest score after 4.2 (c=2 or c_star = 19). Thus, unless this score is also 4.2, the null will be rejected for it. In the scenario where this score is also 4.2, the null will be accepted, but the p-value will be 0.1. So that is consistent.

A different question would be how to do the threshold on p-values. I would say that we should have an inclusive threshold in order to be consistent i.e. reject the null for all p-value <= 0.05. Currently, in the toolbox, it is an exclusive threshold, which I disagree with. Holmes et al. (1996) seems to agree with this inclusive threshold:

For a test at level a, this adjusted p value image is thresholded at level a, declaring voxels with adjusted p values less than or equal to a as activated

Also, regarding the notation in the result display, I would adopt the notation like in Holmes et al. (1996) i.e. something like P(c > k) <= alpha. Currently, it is showing something different.

nicholst commented 5 years ago

OK... so I get it. P-values computed on the basis of >=, and can be thresholded on the basis of >=, but statistic values must be thresholded with an exclusive threshold.

Got it now.

Yes, go for the changes you suggest.

BryanGuillaume commented 5 years ago

I believe everything related to this issue has been fixed in PR #125 and PR #131.