0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

non-parametric permutations #91

Closed romainmartinez closed 5 years ago

romainmartinez commented 5 years ago

Hi Todd and thank you for your amazing work on spm1d.

I have been using spm1d regularly for 2 years, whether for 0D or 1D statistics. I mainly use non-parametric inferences; so I spend a lot of time waiting after my computations. Which brings me to two questions:

  1. Do you have a rule of thumb about the number of permutations? I use something between 1000 and 10,000 depending on the size of the data. I typically choose a number that allows me to finish the calculations in about 5 minutes.

  2. Do you think that a multi-core calculation would be possible when using non-parametric inference? On the python side, you could use some things like dask or just the multiprocessing library included in python. As for Matlab, I think that the multiprocessing library is part of the paid toolboxes.

Another unrelated question, did you consider putting spm1d on conda-forge? Many scientists use anaconda & conda and writing a recipe would be relatively fast as spm1d doesn't include C/C++ code.

I could help with those features.

0todd0000 commented 5 years ago

Thank you for these questions.

  1. The number of permutations should be large enough that it yields numerically stable results, and 10,000 is usually sufficient, but if fewer than 10,000 permutations exist, then all should probably be used. If you choose 10 permutations, for example, and repeat inference a number of times, each time starting with a different random state, you will find that the results are highly variable because 10 permutations yields only 10 test statistic values, which is insufficient to characterize the test statistic distribution under the null hypothesis. As you increase the number of permutations, you will find that this variability exponentially decreases, because more permutations allow you to more accurately characterize the null distribution, and also that the variability converges to a small, non-zero value.

  2. Multi-core calculation would indeed be possible for non-parametric inference, you'd just need to ensure that the permutations are distributed appropriately to each core.

I'd be happy to put spm1d on conda-forge, but would need a bit of help organizing this because I don't have time to set it up myself. If you'd like this to happen please submit a pull request through GitHub, and I'll merge the changes with the main branch after testing them.

Thanks!