Open mrzzmrzz opened 1 year ago
That's a good catch! Do you know based on which PyTorch version you observe this speedup?
CSR is more efficient for matrix multiplication, while COO is more efficient for editing sparse matrices. We are not confident about the coverage of CSR in PyTorch so we fall back to COO everywhere. If CSR is well supported by PyTorch now, we will update TorchDrug accordingly. This will bring a huge acceleration to many GNN models.
I found the sparse tensor multiplication is very slow in the
GearNet
module.Here is the main code in the
message_and_aggerate
:When I leveraged the CSR sparse tensor to replace the original COO sparse tensor, the time spent running GearNet to predict protein labels was reduced by about 50%, e.g., from 16 minutes one epoch to 8 minutes per epoch for RTX 3090 (batch size : 8, GPU: 1).
I'm not sure whether this problem is caused by my own GPU device or the type of sparse tensor. If it's the latter, maybe I can open a pull request for it.