Closed romainmartinez closed 5 years ago
Thank you for these questions.
The number of permutations should be large enough that it yields numerically stable results, and 10,000 is usually sufficient, but if fewer than 10,000 permutations exist, then all should probably be used. If you choose 10 permutations, for example, and repeat inference a number of times, each time starting with a different random state, you will find that the results are highly variable because 10 permutations yields only 10 test statistic values, which is insufficient to characterize the test statistic distribution under the null hypothesis. As you increase the number of permutations, you will find that this variability exponentially decreases, because more permutations allow you to more accurately characterize the null distribution, and also that the variability converges to a small, non-zero value.
Multi-core calculation would indeed be possible for non-parametric inference, you'd just need to ensure that the permutations are distributed appropriately to each core.
I'd be happy to put spm1d on conda-forge, but would need a bit of help organizing this because I don't have time to set it up myself. If you'd like this to happen please submit a pull request through GitHub, and I'll merge the changes with the main branch after testing them.
Thanks!
Hi Todd and thank you for your amazing work on spm1d.
I have been using
spm1d
regularly for 2 years, whether for 0D or 1D statistics. I mainly use non-parametric inferences; so I spend a lot of time waiting after my computations. Which brings me to two questions:Do you have a rule of thumb about the number of permutations? I use something between 1000 and 10,000 depending on the size of the data. I typically choose a number that allows me to finish the calculations in about 5 minutes.
Do you think that a multi-core calculation would be possible when using non-parametric inference? On the python side, you could use some things like
dask
or just themultiprocessing
library included in python. As for Matlab, I think that the multiprocessing library is part of the paid toolboxes.Another unrelated question, did you consider putting
spm1d
on conda-forge? Many scientists use anaconda & conda and writing a recipe would be relatively fast asspm1d
doesn't include C/C++ code.I could help with those features.