Open fishmingyu opened 9 months ago
Thank you for your interests in our work!
I agree with you that the experiment result could be biased due to the framework overhead, since we didn't consider the framework overhead when design the baselines. For DGL, we run the spmm by using the update_all
function under torch.no_grad
.
And it is reasonable that our implementation does not outperform cuSPARSE, since our main effort is to use the experiment to prove our observations.
Unfortunately, due to my career change, I do not have the bandwidth to give you more results for a accurate comparison with baseline. But I am willing to answer any questions you may have related any technical detail of our work.
Thanks again for your opinion!
Your kernel implementation appears to be more straightforward than anticipated. Unfortunately, it does not outperform cuSPARSE or other rapid sparse kernel libraries in terms of efficiency. Concerning your benchmarks, the evaluation seems biased, as all your test files focus on single-layer operations without considering any framework overhead, whereas your baseline measurements are taken within a framework context. For a more equitable comparison, could you supply relevant artifacts or adjust your testing methodology to include framework overheads in both your implementation and the baseline?