google-research / sputnik

A library of GPU kernels for sparse matrix operations.
Apache License 2.0
241 stars 50 forks source link

SPMM #14

Open a1941409241 opened 8 months ago

a1941409241 commented 8 months ago

When initializing the sparse_tile_loader,the threadIdx.x should be threadIdx.x%kBlockWidth. Is what I said correct ?

tgale96 commented 8 months ago

Hi! We pass threadIdx.x directly (code).

a1941409241 commented 8 months ago

But if using subwarp tiling,different subwarps should correspond to a new line?

tgale96 commented 8 months ago

I believe that is handled in the block configuration passed in for kernel launch.

a1941409241 commented 8 months ago

Should I use CudaSpmm directly to use the lib after I make install? code

a1941409241 commented 8 months ago

And if I don't use bias, what I need to do is just pass nullptr to this argument or use the default value, is what i said correct?

tgale96 commented 8 months ago

Yes, CudaSpmm is the right API. If you don't need a fused bias + relu you can call this API. If you want to fuse the operations we have CudaSpmmBiasRelu.

a1941409241 commented 8 months ago

Emmm I have a question. This project determines the configuration of spmmconfig based on the size of the input dense matrix, but this introduces runtime on the CPU. The time I spend directly using cudaspmm is much longer than cuSPARSE, but if I refactor the project and configure spmmconfig myself, the time is shorter than cuSPARSE. But in this case, does it mean that the universality is not good enough, or do I suggest create some instances of cudaspmmex specific to spmmconfig in the library? Is this feasible?

tgale96 commented 7 months ago

Interesting! I would think your problem must be quite small for that to be the case? The tuning heuristics in this library are by no means expected to be good across all problems and if you know what config is best for your problem you should pass that explicitly.