apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Feature request: Can anyone implement an operator equivalent to tensorflow.multinomial? #3831

Closed WarBean closed 7 years ago

WarBean commented 7 years ago

Tensorflow use operator multinomial for sampling inside computational graph. It is useful for schedule sampling training of RNN. Recent work on generating sentences by GAN (seqGAN) also relies on tf.multinomial in their implementation.

I suppose that without such operator, it's troublesome to integrate sampling into computation flow during training phase. So I try to implement it myself. However, after some research I found no way to perform discrete distribution sampling in cuda API. Here's a unsuccessful trail: (Generating sample from a non uniform discrete distribution.)

Off course I can use mx.operator.CustomOp to write a Numpy version, but it's the last choice for me.

sxjscience commented 7 years ago

@WarBean You can first sample values from Uniform(0, 1) and then find which region in the CDF contains the value. For example, assume your probability distribution is prob, you can do the sampling via numpy.searchsorted(numpy.cumsum(prob), rng.rand()). It would be great if you could implement such a sampler for MXNet.

You can find the code of the supported sampling ops here https://github.com/dmlc/mxnet/blob/nnvm/src/operator/tensor/sample_op.h. Also, you may need to view the code in the MShadow side, which does the real implementation.

We should be able to write all common sampling functions (Gamma, T-distribution, Dirichlet, etc.) with the basic ones provided in cuRAND like Uniform, Normal, LogNormal, Poisson.

WarBean commented 7 years ago

@sxjscience After reading sample_op.h I have some idea on how to continue for a CPU version (something logic like numpy.searchsorted(numpy.cumsum(prob), rng.rand()) as you mentioned) . But I'm not sure how to continue on for GPU version. More precisely, I don't know how to integrate cuda kernel api to sample in parallel within a mini-batch. Could you offer some direction on how to directly use cuda kernel (calling a function with __device__ qualifier) in Forward, Backward function? Or there is some other approaches?

sxjscience commented 7 years ago

@WarBean You can directly call cuda kernels in the forward & backward functions. https://github.com/dmlc/mxnet/blob/master/src/operator/roi_pooling.cu is one example.

WarBean commented 7 years ago

@sxjscience Exactly what I need. I will try it.

asmushetzel commented 7 years ago

mxnet now (since a week ago) has a lot of additional sampling operators, see sample_op.h/cc and multisample-op.h/cc The latter version is one where parameters for the distributions are input tensors. So far we implemented exponential/gamma/poisson/negative binomial. And just for CPUs. Will look whether we can add CPU-support for multinomial easily. Getting them all on GPUs as well is also something we will look at.

sxjscience commented 7 years ago

@asmushetzel What's the progress now for the GPU support?

asmushetzel commented 7 years ago

Almost done. I hope we can get the pull request out by next week. This will also change the CPU implementations as we are using generic code that is equivalent on GPU & CPU. One thing: This is a thread specifically about multinomial distribution. This distribution was brought in independently by Eric some time ago and supports CPU/GPU already. So not sure why this issue here is still open. Shouldn't we close it?

sxjscience commented 7 years ago

OK, do we have a separate issue for the general sampling OPs? I'm going to close it as it's only about multinomial, which has been supported. https://mxnet.incubator.apache.org/api/python/symbol.html#mxnet.symbol.sample_multinomial

asmushetzel commented 7 years ago

We don't have a separate issue for general sampling. Feel free to open one, otherwise I will put you on cc on the upcoming pull request

piiswrong commented 7 years ago

We already have multinomial and it works on both cpu and gpu