Open a1941409241 opened 8 months ago
But if using subwarp tiling
,different subwarps should correspond to a new line?
I believe that is handled in the block configuration passed in for kernel launch.
Should I use CudaSpmm directly to use the lib after I make install? code
And if I don't use bias
, what I need to do is just pass nullptr
to this argument or use the default value, is what i said correct?
Yes, CudaSpmm is the right API. If you don't need a fused bias + relu you can call this API. If you want to fuse the operations we have CudaSpmmBiasRelu.
Emmm I have a question. This project determines the configuration of spmmconfig based on the size of the input dense matrix, but this introduces runtime on the CPU. The time I spend directly using cudaspmm
is much longer than cuSPARSE, but if I refactor the project and configure spmmconfig
myself, the time is shorter than cuSPARSE. But in this case, does it mean that the universality is not good enough, or do I suggest create some instances of cudaspmmex
specific to spmmconfig
in the library? Is this feasible?
Interesting! I would think your problem must be quite small for that to be the case? The tuning heuristics in this library are by no means expected to be good across all problems and if you know what config is best for your problem you should pass that explicitly.
When initializing the
sparse_tile_loader
,the threadIdx.x should be threadIdx.x%kBlockWidth. Is what I said correct ?