Number of maximal permutations for non-parametric approach

tramarobin commented 9 months ago

Hi Todd,

I am collaborating on a project with a small number of participants (n=9) who are each experiencing two different conditions. I am using a paired t-test with a non-parametric setting. The maximum number of permutations possible with this dataset is 2^9 = 512.

I initially thought that if I were to perform all possible permutations, there would be no more randomness in the selection of permutations, and the threshold for the test should remain consistent when I run the test multiple times.

However, I have observed variations in the threshold (I conducted the test 1000 times). Furthermore, the variation increased significantly when I reduced the alpha level to 0.01.

Am I overlooking something?

Best, Robin Trama

0todd0000 commented 9 months ago

Hello! If you are using all permutations the result should be identical each time, so something does indeed sound strange. Can you please paste the spm1d section of your code into this thread?

tramarobin commented 9 months ago

The code runs in matlab.

The t-test is this one : Ttest=spm1d.stats.nonparam.ttest_paired(DATA{1},DATA{2}); DATA{1} and DATA{2} being my 2 conditions for my 9 participants, both size 9x36.

My inference is this one: Ttest_inf=Ttest.inference(0.05,'iterations',512,'two_tailed',logical(1));

If I am doing 513 iterations, there is an error messag saying that 512 is indeed the maximum number of iterations. (Number of specified iterations (513) exceeds the maximum possible number of iterations (512)). I am observing the same for a one-tailed test as well.

While Ttest stays the same, some values of Ttest_inf (Ttest_inf.PDF1, Ttest_inf.PDF0, Ttest_inf.permuter.ZZ, Ttest_inf.permuter.Z2, and Ttest_inf.permuter.Z) are changing each time I am executing the line.

0todd0000 commented 9 months ago

Please try using -1 for iterations instead of 512 like this:

Ttest_inf = Ttest.inference(0.05, 'iterations',-1, 'two_tailed',logical(1) );

When you use -1 all existing permutations (all 512) will be executed.

When you use a positive integer, permutations will be selected randomly. This means that single permutation may be run multiple times.

tramarobin commented 9 months ago

The -1 solution works, thank you.

This raises 2 additional questions.

1) I am often doing more than 10000 iterations, so my 'force_iterations' parameters is logical(1), and my number of iterations is greater than 10000. In this case, will some the permutations be the same ?

2) For 10000 iterations, is this better to do ? Ttest_inf = Ttest.inference(0.05, 'iterations', -1, 'two_tailed', logical(1), 'force_iterations', logical(0)); or Ttest_inf = Ttest.inference(0.05, 'iterations', 10000, 'two_tailed', logical(1));

tramarobin commented 9 months ago

Hi Todd,

I ran some tests on my own and I found disturbing results.

I took another set of data with 20 particpants, allowing 1048576 maximal iterations.

I ran the test 10 times with 1048576 iterations. Tthe threshold is very stable as to be expected (the coefficient of variation of the test is 0.1218%). thresholds=[2.29248346308494 2.28658525490232 2.29233312190740 2.29049957362352 2.29497006999369 2.29291770922786 2.29367038975881 2.29393680779557 2.29524502075649 2.28858357132402]

I also ran the test with the number of iteration set to -1 (the test is stable and always the same). I was thinking this number to be close to the previous thresholds... However, the threshold was 2.595460922258361, which is 0.3 (or 10%) higher than the previous thresholds.

Which threshold should I then trust ? I would intuitively use the test with all the permutations, but implementing that for tests with several millions or more possible iterations is not feasible.

0todd0000 commented 9 months ago

The -1 solution works, thank you.

This raises 2 additional questions.

I am often doing more than 10000 iterations, so my 'force_iterations' parameters is logical(1), and my number of iterations is greater than 10000. In this case, will some the permutations be the same ?

For 10000 iterations, is this better to do ? Ttest_inf = Ttest.inference(0.05, 'iterations', -1, 'two_tailed', logical(1), 'force_iterations', logical(0)); or Ttest_inf = Ttest.inference(0.05, 'iterations', 10000, 'two_tailed', logical(1));

I suggest:

Check the maximum number of possible permutations (N).
If N is reasonably small (< 100,000) then it may be desirable to run all permutations.
If N is moderate-to-large, then it is usually suitable to select 10,000 or 100,000 random permutations.

0todd0000 commented 9 months ago

Hi Todd,

I ran some tests on my own and I found disturbing results.

I took another set of data with 20 particpants, allowing 1048576 maximal iterations.

I ran the test 10 times with 1048576 iterations. Tthe threshold is very stable as to be expected (the coefficient of variation of the test is 0.1218%). thresholds=[2.29248346308494 2.28658525490232 2.29233312190740 2.29049957362352 2.29497006999369 2.29291770922786 2.29367038975881 2.29393680779557 2.29524502075649 2.28858357132402]

I also ran the test with the number of iteration set to -1 (the test is stable and always the same). I was thinking this number to be close to the previous thresholds... However, the threshold was 2.595460922258361, which is 0.3 (or 10%) higher than the previous thresholds.

If you want to use the maximum number of iterations please use -1 and not a positive integer.

Which threshold should I then trust ? I would intuitively use the test with all the permutations, but implementing that for tests with several millions or more possible iterations is not feasible.

Trust -1 when N is reasonably small. This is the only case where information is not faked. If you use any number other than -1 a random subset of all permutations will be selected, and several of these will be selected multiple times, leading to incorrectly increased density at that point in the probability distribution.

When N is too large to run all permutations (e.g. N=1e12) then I suggest selecting a specific number of permutations n) (e.g. n=10000) and then running several times like you have. There will be numerical fluctuation, but provided that fluctuation does not qualitatively affect the results then the results can be trusted.

tramarobin commented 9 months ago

Hi Todd,

Thanks for your explanations!

When I read Nichols & Holmes 2002's paper, I never thought the random permutation tests they talked about involved picking a permutation and then putting it back in the sample. I guess part of that is because they used the max number of permutations in their example. But, when I looked at other articles and websites about random permutation tests, they all talked about choosing permutations without putting them back (unlike bootstrapping techniques). Do you think this difference in how permutations are handled could explain why there are different thresholds when using an integer instead of -1?

If this handling of permutations does matter, do you think it would be more accurate to change the algorithm so that permutations aren't put back in the sample? That way, if you select all permutations, the result would stay consistent. And if you choose a subset of permutations, you will do exactly this number of different permutations. Plus, it would get rid of the issue of having to decide between an integer or -1 when figuring out the right number of iterations.

0todd0000 commented 9 months ago

When I read Nichols & Holmes 2002's paper, I never thought the random permutation tests they talked about involved picking a permutation and then putting it back in the sample.

Agreed. This is a limitation of spm1d and not the technique in general. spm1d 0.4 lazily chooses random permutations.

If this handling of permutations does matter, do you think it would be more accurate to change the algorithm so that permutations aren't put back in the sample?

Yes, spm1d 0.5 will support random permutations without replacement.

tramarobin commented 9 months ago

Yes, spm1d 0.5 will support random permutations without replacement. This is great news !

Thanks again for answering all my questions. I will change my code to use "-1" iterations in some cases. It should work as a temporary fix waiting for the v0.5.

0todd0000 / spm1d

Number of maximal permutations for non-parametric approach #265