fff-rs / juice

The Hacker's Machine Learning Engine
1.11k stars 76 forks source link

Implement BatchNormalization2d Layer #145

Open colepoirier opened 3 years ago

colepoirier commented 3 years ago

Hi @drahnr,

I'd like to take a crack at implementing the BatchNormalization2d layer in juice.

Referring back to #10 you outlined that adding new cuDNN layers to juice is a 4 step task:

This is a 4 step task:

I am unfortunately having trouble deciphering how you went about turning the unsafe extern "C" version of a function like sigmoid_forward into the safe version that now exists in rcudnn. Which I see is the basis for adding it to coaster-nn, then juice.

Can you help me by outlining exactly how you want contributions like this to be carried out in the codebase?

Thanks very much for your help!

Cole

drahnr commented 3 years ago

Hey @colepoirier, that's awesome!

The process is a bit tedious, especially the ffi to safe rust step, let me walk you through this:

First there is bindgen based *-sys crate that allows us to autogenerate the C bindings, but they are not very rustic. So the next layer is import the -sys crate as ffi https://github.com/spearow/juice/blob/master/rcudnn/cudnn/src/lib.rs#L75 and we wrap the ffi::* functions in a still unsafe rust function, but now with a Result type mapped from whatever mechanism is used in the C API to convey the return status, usually by an return C enum https://github.com/spearow/juice/blob/master/rcudnn/cudnn/src/api/activation.rs#L81-L98.

Next layer is then to make it safe, catch any invalid preconditions, in most cases this is just mapping the input arguments, but now the function call is safe. We can also add input checks based on the documentation of valid inputs and outputs, but usually the C function does this already so there is no need to.

At this point there are safe rust fns that we can use. We do something similar with structs, i.e. https://github.com/spearow/juice/blob/master/rcudnn/cudnn/src/activation_descriptor.rs which are used in the safe rust API instead of the generated C types.

This is now the API that i.e. rcudnn or rcublas provide and is ready to be used with coaster.


coaster-nn now handles the specifics of the implementation per backend as you already know. There are two key things outline in https://github.com/spearow/juice/blob/master/coaster-nn/src/frameworks/cuda/mod.rs#L431-L438

One is defining a abstract configuration that works across all backends, so it should not contain any backend specifics. But there is also the impl of the particular impl<T> Sigmoid<T> for Backend<Cuda> i.e. which is a requirement for using the operation with a particular backend. Note that a lot of the impl has been abstracted with small convenience macros to avoid repetitive code.


The final step to enable a new layer type, is to add it to the trait LayerOps. This is far from optimal but rust still has a few limitations that prevents a more fine grained handling of which backend implements which operation - on a per backend basis yet at compile time. https://github.com/spearow/juice/blob/b80fb54a977722b308c04ad5301eb6895d08d612/juice/src/util.rs#L103-L141

Feel free to ping me on https://gitter.im/spearow/juice or here, looking forward to your first PR!

colepoirier commented 3 years ago

Thank you so much for this through guide and explanation! I plan on starting my attempt today and will ping you on gitter if (when) I get stuck :)