google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers
Apache License 2.0
705 stars 75 forks source link

Q's on Performer & Text Classification #19

Open Muennighoff opened 3 years ago

Muennighoff commented 3 years ago

Thanks for the great work. I had a couple questions when trying to reproduce the Performer on the Byte Level Text Classification:

  1. What Kernel Function are you using? (Softmax approximation or Relu?)
  2. I found the training to be very instable. Do you take the final model after 20K steps or do you take the best checkpoint?
  3. With the learning rate scheduler you use, the learning rate is 0 if the first step is 0 isn't it? Shouldn't you instead start your training loop with for step in range(1, X) at https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/text_classification/train.py

Looking forward to the implementations of the other models, thanks!

jinfengr commented 3 years ago

FYI: the implementations of all models are available now.