ant-research / Pyraformer

Apache License 2.0
235 stars 37 forks source link

About args use_tvm #24

Open htg17 opened 1 year ago

htg17 commented 1 year ago

In long_range_main.py the use_tvm arg is set to be default FALSE, and in the sample scripts this arg is not triggered. But if this arg is FALSE, it seems that pyramidal attention is not used in the whole model, which is the main contribution of the paper.

So if this arg should be set TRUE when I want to use pyramidal attention to save computation lost?

Zhazhan commented 1 year ago

We provide two implementations of pyramidal attention, namely the naive version and the TVM version, where the Naive version cannot reduce the complexity of time and space. Because the TVM version may require the user to compile the TVM, we set use_tvm=False by default to facilitate the reproduction of our results.

If you want to use the TVM implementation without compiling TVM, please set use_tvm=True and make sure: (1) the operating system is Ubuntu, (2) the CUDA version is 11.1. Otherwise, you can compile TVM 0.8.0 according to their official guide https://tvm.apache.org/docs/.

If you feel too troubled to compile, as an alternative, you can find a compiled TVM docker image from https://tvm.apache.org/docs/install/docker.html#docker-source. Then delete files under 'pyraformer/lib' and run the code again.

htg17 commented 1 year ago

Thanks for answering. I just wonder whether the naive choice is pyramidal attention.

If use_tvm=FALSE, the MultiHeadAttention in SubLayers.py is used as self-attention model. But it seems that the MultiHeadAttention is just a vanillla attention.

Zhazhan commented 1 year ago

The Naive implementation implements pyramidal attention by adding an attention mask to the attention score matrix. The 'MultiHeadAttention' module is indeed the vanilla attention. The differences lie in the 'Encoder' module. Please refer to line 19-22 and 51-54 in pyraformer/Pyraformer_LR.py.