Smerity / sha-rnn

Single Headed Attention RNN - "Stop thinking with your head"
1.18k stars 134 forks source link

Results from Adaptive Span Transformer #11

Closed djstrong closed 4 years ago

djstrong commented 4 years ago

In the readme Adaptive Span Transformer is referred as small, but actually it is normal version. Small version achieves 1.3 BPC (BPB actually).

Smerity commented 4 years ago

Where are you seeing this? I'm looking at the paper for enwik8 results in Table 2. Maybe you were looking at Table 1 which is text8? Nowhere can I see 1.3 bpc and that result would be higher than SotA two or three years ago.

Thanks for keeping an eye out though!

https://arxiv.org/abs/1905.07799

On Sat, Jan 25, 2020 at 12:20 AM djstrong notifications@github.com wrote:

In the readme Adaptive Span Transformer is referred as small, but actually it is normal version. Small version achieves 1.3 BPC (BPB actually).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Smerity/sha-rnn/issues/11?email_source=notifications&email_token=AAAH4RPFW5CMVW3LCDHZUFLQ7PY47A5CNFSM4KLPUSR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IIVQW2Q, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH4RLHQUA2M2MDN2ZYZ2LQ7PY47ANCNFSM4KLPUSRQ .

--

Regards, Stephen Merity

e: smerity@smerity.com w: smerity.com linkedin: http://au.linkedin.com/in/smerity

djstrong commented 4 years ago

You are right, in the paper they present small and large models.

However, I was looking at their repo (https://github.com/facebookresearch/adaptive-span):

Scripts for running experiments in the paper are located in ./experiments/ directory. For example, a smaller 8-layer version of our model can be trained on a single GPU by running:

bash experiments/enwiki8_small.sh It should reach about 1.3bpc on dev after 150k steps.

...

Experiment #params dev (bpc) test (bpc)
enwik8 38M 1.04 1.02
enwik8_large 209M 1.00 0.98

In the experiments directory are scripts: enwik8_small.sh, enwik8.sh, enwik8_large.sh. So, "small" from the paper is enwik8.sh. That confused me.