It seems that ARS-5 might be the fastest counter based RNG out here with CPUs have AES instruction. Is there any reason it is not included? MKL use ARS-5 as one of its random generator, and it should be pretty mature. Since it's actually a simplified version of AES-CTR in this repo, would it be better that you open a branch with your code on AES round function, or what is your advice on reusing that part of code?
BTW, I apologize for using the phrase 'it's not vectorized' at discussion in NumPy; that may have come across as quite harsh. I was trying to defend my idea regarding qrand, whose core concept is ARS with modifications to reduce rounds and achieve vectorization similar to dSFMT with VAES and AVX. This library is awesome for experimenting with various random generators, and I would be delighted to help in adding a vectorized ARS-5 to it if you think it's worthwhile.
I took a look and I think no since the implementation of ars does not have a fall-back slow path that can be used for CPUs thta do not support AESNI (i.e., outside of x86_64).
It seems that ARS-5 might be the fastest counter based RNG out here with CPUs have AES instruction. Is there any reason it is not included? MKL use ARS-5 as one of its random generator, and it should be pretty mature. Since it's actually a simplified version of AES-CTR in this repo, would it be better that you open a branch with your code on AES round function, or what is your advice on reusing that part of code?
BTW, I apologize for using the phrase 'it's not vectorized' at discussion in NumPy; that may have come across as quite harsh. I was trying to defend my idea regarding
qrand
, whose core concept is ARS with modifications to reduce rounds and achieve vectorization similar to dSFMT with VAES and AVX. This library is awesome for experimenting with various random generators, and I would be delighted to help in adding a vectorized ARS-5 to it if you think it's worthwhile.