FluxML / Flux.jl

Relax! Flux is the ML library that doesn't make you tensor
https://fluxml.ai/
Other
4.46k stars 603 forks source link

Sparse grads #577

Open Drvi opened 5 years ago

Drvi commented 5 years ago

Related to #206 and partly to #66, a feature request: allow sparse gradients for getindex, as pytorch does for its embedding layer.

I have a large embedding matrix and the gradient for getindex is creating a large array of zeroes every time it's called and this kills my GPU performance. I don't think the getindex should use sparse data structures in the general case, and I'm not sure whats the best API for that, but this is just a big road block for me.

I have troubles using views and sparse arrays with CuArrays and Flux (https://github.com/JuliaGPU/CuArrays.jl/issues/267#issue-403606632), so I couldn't really experiment with the idea.

Some API ideas a) Define a minimal Embedding struct and use a special gradient definition for getindexing into this type. E[:,i] dispatches to the sparse definition of getindex because E isa Embedding. b) Define a special indexing type, i.e. X[:, sparsely(i)] dispatches to the sparse definition of getindex becase of the resulting type of sparsely(i) c) Define a function sparsegetindex(x, i...) that is just getindex with a sparse grad definition

As a workaround I guess I can split the big embbedding matrix into multiple small ones, but I'm really not looking forward working with this kind of setup.

Thanks a lot and please let me know if I can help (but my GPU and Tracker knowledge is limited).

datnamer commented 5 years ago

+1. This would be helpful for me as well.

MikeInnes commented 5 years ago

I'm happy in principle to just always return sparse gradients in the case of getindex. The challenge is just to decide on a reasonable sparse container and make sure it's GPU-compatible. It pretty much just needs to support broadcast, so it should be straightforward to do.

Drvi commented 5 years ago

I'll take a stab at it, if you have anything I can look at to help me get started, I'd appreciate that.

MikeInnes commented 5 years ago

Probably the best thing is to dig through how onehotmatrices work; that's another kind of GPU-compatible sparse type that should be very similar to what you want to do here.

Drvi commented 5 years ago

Ah, that's right, I don't actually need cusparse to do sparse things on the gpu. Thanks! I'll try to put something together, probably on friday.

ToucheSir commented 3 years ago

We now have an efficient one-hot array implementation. General sparse matrix support is better handled in NNlib and the GPU libraries, I think.

CarloLucibello commented 3 years ago

an Embedding layer is in the works #1516