Open Ronsor opened 3 months ago
I'm currently working on adding training support for the MNIST example in https://github.com/ggerganov/ggml/pull/908 . I have a working backward pass for im2col
and pool2d
(the ops needed for the convolutional neural network). I'm currently working on cleaning up the code and putting it into a state that can be reviewed. When I added tests to test-grad0
I also added a fix to deal with noncontinuous gradients when numerically calculating the gradients to compare against backpropagation; this fix or an equivalent one will also be needed for clamp
.
It might be better to wait for @JohannesGaessler to merge #908 and then continue this PR?
That's probably best, considering the changes needed for the tests.
I extended the code in test-backend-ops
to enable checking gradients from backpropagation against numerically calculated gradients. New tests for gradients should be implemented there if possible (the only thing that currently doesn't work is support for FP16). In principle all that should be necessary is to add ggml_set_param
to the existing tests (though tuning the parameters in such a way that you get good numerical precision for the reference values can be tricky).
Perfect. I plan to finish this PR this weekend.
This PR will add backward computations for most operators once completed.
Leaving
pad
,im2col
, andnorm
for a future PR now.Currently unsure if I should fuse the multiply + gradient computation for
gelu_back
/gelu_quick_back
like withsilu_back
.