feat: implement backward computation for more operators

ggerganov / ggml

Tensor library for machine learning

MIT License

11.26k stars 1.05k forks source link

feat: implement backward computation for more operators #921

Open Ronsor opened 3 months ago

Ronsor commented 3 months ago

This PR will add backward computations for most operators once completed.

[x] Tanh
[x] Sigmoid
[x] GELU + GELU (quick)
[x] ELU
[x] clamp
[x] LeakyReLU
[x] mean
[x] concat

Leaving pad, im2col, and norm for a future PR now.

Currently unsure if I should fuse the multiply + gradient computation for gelu_back/gelu_quick_back like with silu_back.

JohannesGaessler commented 3 months ago

I'm currently working on adding training support for the MNIST example in https://github.com/ggerganov/ggml/pull/908 . I have a working backward pass for im2col and pool2d (the ops needed for the convolutional neural network). I'm currently working on cleaning up the code and putting it into a state that can be reviewed. When I added tests to test-grad0 I also added a fix to deal with noncontinuous gradients when numerically calculating the gradients to compare against backpropagation; this fix or an equivalent one will also be needed for clamp.

ggerganov commented 3 months ago

It might be better to wait for @JohannesGaessler to merge #908 and then continue this PR?

Ronsor commented 3 months ago

That's probably best, considering the changes needed for the tests.

JohannesGaessler commented 2 months ago

I extended the code in test-backend-ops to enable checking gradients from backpropagation against numerically calculated gradients. New tests for gradients should be implemented there if possible (the only thing that currently doesn't work is support for FP16). In principle all that should be necessary is to add ggml_set_param to the existing tests (though tuning the parameters in such a way that you get good numerical precision for the reference values can be tricky).

Ronsor commented 2 months ago

Perfect. I plan to finish this PR this weekend.