I'd like to add a Cython API

bashtage / ng-numpy-randomstate

Numpy-compatible random number generator that supports multiple core psuedo RNGs and explicitly parallel generation.

Other

45 stars 14 forks source link

I'd like to add a Cython API #47

Closed honnibal closed 8 years ago

honnibal commented 8 years ago

Hi,

I need to repeatedly create a vector of a around 10^6 random doubles, as part of some neural network code I'm writing in Cython. I want to release the GIL around this function, so I need:

1) A fast PRNG; 2) With a permissive license; 3) With a public C-level API.

It looks like you've got 1 and 2, so I'd like to see whether I can add 3 :). For my purposes, the ideal API would be something like this:


cdef void n_doubles_from_normal(double* result, int n, int seed) nogil:
   ...

I understand that the goal of this repository is to get integrated into numpy. But, would you accept a pull request with a Cython .pxd file, the nogil functions, and the appropriate changes to the setup.py?

bashtage commented 8 years ago

Hi,

Not sure I totally see at what level this API would exist. Would it hang off of a RandomState instance? I suppose it needs to avoid directly handling the actual state.

I also don't quite understand the function name - why _from_normal?

Presumably you would want a simple wrapper of the function random_uniform_fill in distributions.c (https://github.com/bashtage/ng-numpy-randomstate/blob/master/randomstate/distributions.c#L39 ).

I'm still not totally sure how you can completely avoid GIL since using a RandomState instance requires accessing self, which requires GIL.

honnibal commented 8 years ago

Sorry I should've been a bit clearer. By normal I meant, draw from a normal distribution. I thought I'd said above that I need to draw from a Gaussian, but I see that I didn't.

You can read from and write to self attributes without the GIL, so long as they're cdef attributes. You can't access self.foo if foo is a Python object, though. I guess I should've spent more time understanding the design. I'll take another look.

honnibal commented 8 years ago

Hmm. You could have a cdef method of RandomState that made a call to random_gauss_fill. But that actually isn't so helpful. In the Python version, you have the state object as a global variable, and you just add these methods to the global namespace by assigning them to global variables.

In Cython this wouldn't work, so you'd only be able to use these cdef methods if you first create a new RandomState instance, or pass one in. But in both cases, you'd have to acquire the GIL.

Maybe have a cdef function that did the setup and teardown around a call to random_uniform_fill?

bashtage commented 8 years ago

I would suppose the simplest method would be to write a basic functional interface that would have signatures like


cdef seed(aug_state* state) nogil:
    # Do seeing stuff

cdef normals(aug_state* state, double* out, int n) nogil:
   random_normal_fill(state, out, n)

The only difficulty with this is that the structure aug_state isn't very friendly.

Some of the PRNGs are easier to use than others -- in particular xorshift use arrays of uint64 so a basic state could be easily manipulated using only NumPy arrays (or directly using malloc).This isn't really the same as an aug_state which has place holders for lots of other stuff that isn't needed for most distributions.

honnibal commented 8 years ago

I guess I really only need xorshift + Ziggurat. So maybe I should just extract the things I need and make my own little package of them.

I was tempted to say the state can just live as a global variable. But if there are race conditions that make the random sequence unpredictable at unpredictable times, eventually I'll probably go crazy debugging something. So I should probably just accept some set-up/tear down.

Thanks for the help.

bashtage commented 8 years ago

I think you are right, that in the special case where MT code needs to release the GIL a lot of care is needed, and so it is probably easiest to use xorshift1024 + splitmix64 for seeding directly.