YukeWang96 / TC-GNN_ATC23

Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
46 stars 13 forks source link

Cuda Graph optimization #5

Open plant310 opened 1 year ago

plant310 commented 1 year ago

hi,I used nsight system to view the timeline after using cuda graph and found that the spmm kernels in the forward and backward passes were clustered together, which seems to break the logic of the program. Is there any solution for this? image

YukeWang96 commented 1 year ago

Do these two SpMM functions correspond to the two-layer forward of the GCN model?

plant310 commented 1 year ago

The dependency in combination and aggregation operation seems to be broken. And I compare the test accuracy with and without the cuda graph optimization, it looks like that cuda graph optimization makes the test accuracy drop to a very low level