Allow BatchNorm training on CUDA with `track_stats=false`

ToucheSir commented 3 years ago

Gathered from https://discourse.julialang.org/t/batchnorm-only-track-stats-true-supported-on-gpu/62091.

This would most likely require changes in NNlibCUDA as well. I'm not sure how interchangeable the various cudnnBatchNormalizationForward* functions are, so putting a pin in this until someone more knowledgeable can comment.

CarloLucibello commented 3 years ago

Looking at cudnn docs it seems that we should be able to support this:

resultRunningMean, resultRunningVariance
Inputs/Outputs. Running mean and variance tensors (these have the same descriptor as the bias and scale). Both of these
 pointers can be NULL but only at the same time. The value stored in resultRunningVariance (or passed as an input in inference
 mode) is the sample variance and is the moving average of variance[x] where the variance is computed either over batch or
 spatial+batch dimensions depending on the mode. If these pointers are not NULL, the tensors should be initialized to some
reasonable values or to 0.

prikmm commented 1 year ago

Hey @CarloLucibello ,

I would like to work on this. Can you assign this to me?

I have no prior julia experience, but, have good cuda and C++ experience.

ToucheSir commented 1 year ago

We don't need to explicitly assign anything for people to work on them. If you're interested, feel free to file a PR and we'll start from there.

paulnovo commented 1 month ago

Should this be closed? My PRs have merged.

FluxML / Flux.jl

Allow BatchNorm training on CUDA with `track_stats=false` #1606