google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers
Apache License 2.0
710 stars 77 forks source link

Linear Transformer code base #23

Closed maximzubkov closed 3 years ago

maximzubkov commented 3 years ago

Hello!

Thank you for your work on long-range-arena it's impressive! My name is Maksim Zubkov, and I am working on the improvement of the Linear Transformer proposed in the paper Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Angelos Katharopoulos et al, 2020). I want to compare the results of the vanilla Linear transformer with the one achieved by the model I proposed. In this regard, it would be very useful for my research to get access to the code base of the Linear transformer you used in the LRA. Due to this fact, I have the following questions:

  1. How soon do you plan to publish the code of the Linear Transformer used in the experiments?
  2. Have you somehow changed the mechanism proposed in the original repo? Or you basically change attention_fn in nn.SelfAttention?

Best regards, Maksim

MostafaDehghani commented 3 years ago

Thank you for your interest in LRA.

  1. How soon do you plan to publish the code of the Linear Transformer used in the experiments?

You can find the code at: https://github.com/google-research/long-range-arena/tree/main/lra_benchmarks/models/linear_transformer

Have you somehow changed the mechanism proposed in the original repo? Or you basically change attention_fn in nn.SelfAttention?

I think the code answers that, but we change replace the whole SelfAttention module: https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/models/linear_transformer/linear_attention.py#L75

maximzubkov commented 3 years ago

Thank you very much!