Open Drvi opened 5 years ago
+1. This would be helpful for me as well.
I'm happy in principle to just always return sparse gradients in the case of getindex
. The challenge is just to decide on a reasonable sparse container and make sure it's GPU-compatible. It pretty much just needs to support broadcast, so it should be straightforward to do.
I'll take a stab at it, if you have anything I can look at to help me get started, I'd appreciate that.
Probably the best thing is to dig through how onehotmatrices work; that's another kind of GPU-compatible sparse type that should be very similar to what you want to do here.
Ah, that's right, I don't actually need cusparse to do sparse things on the gpu. Thanks! I'll try to put something together, probably on friday.
We now have an efficient one-hot array implementation. General sparse matrix support is better handled in NNlib and the GPU libraries, I think.
an Embedding layer is in the works #1516
Related to #206 and partly to #66, a feature request: allow sparse gradients for getindex, as pytorch does for its embedding layer.
I have a large embedding matrix and the gradient for getindex is creating a large array of zeroes every time it's called and this kills my GPU performance. I don't think the getindex should use sparse data structures in the general case, and I'm not sure whats the best API for that, but this is just a big road block for me.
I have troubles using views and sparse arrays with CuArrays and Flux (https://github.com/JuliaGPU/CuArrays.jl/issues/267#issue-403606632), so I couldn't really experiment with the idea.
Some API ideas a) Define a minimal
Embedding
struct and use a special gradient definition for getindexing into this type.E[:,i]
dispatches to the sparse definition of getindex becauseE isa Embedding
. b) Define a special indexing type, i.e.X[:, sparsely(i)]
dispatches to the sparse definition of getindex becase of the resulting type ofsparsely(i)
c) Define a functionsparsegetindex(x, i...)
that is justgetindex
with a sparse grad definitionAs a workaround I guess I can split the big embbedding matrix into multiple small ones, but I'm really not looking forward working with this kind of setup.
Thanks a lot and please let me know if I can help (but my GPU and Tracker knowledge is limited).