Open yeliu918 opened 4 years ago
TVM can be used in multiple different ways. The blog post you mentioned is about using TVM to compile an existing pytorch model into one big optimized binary. This can make the model a bit faster but it doesn't solve the O(n^2) problem of selfattention. As you said, it also doesn't take into account sparse tensors (we don't use sparse tensors anyways).
What we used is a lower-level TVM construct that lets you write your own cuda kernel, compile it into binaries, then call it as if it is a regular pytorch function. So yes, as you said, it is our implementation of the cuda kernel that makes it possible to only compute the non-zero values. Our code is similar to the 3 nested loops of regular matrix multiplication but only computes certain diagonals of the output tensor, then store them as columns in a tensor with some padding.
Thanks for the clarification. Could you specify where is the code to accomplish the 3 nested loops of regular matrix multiplication? I guess it's in the tvm/libtvm_runtime.so?
All our TVM code is here: https://github.com/allenai/longformer/blob/master/longformer/diagonaled_mm_tvm.py, and the nested loops are these lines https://github.com/allenai/longformer/blob/master/longformer/diagonaled_mm_tvm.py#L52-L82
The code under https://github.com/allenai/longformer/tree/master/tvm, which compiles into libtvm_runtime.so, is copied from the tvm library to load and run binaries.
Thanks for the quick response! Very appreciate!
Hi,
Nice work! Very interesting and useful!
I didn't know the Tensor Virtual Machine (TVM) before. And I check the blog about TVM. https://tvm.apache.org/2020/07/14/bert-pytorch-tvm
From my understanding, I don't think the original design of TVM can only compute and store the non-zero value. So does it is your implementation that makes Longformer only compute and store the non-zero values? I want to try this idea on our own model. Could you give me more hint about how does your model achieve that?
Best, Ye