Closed djstrong closed 4 years ago
Where are you seeing this? I'm looking at the paper for enwik8 results in Table 2. Maybe you were looking at Table 1 which is text8? Nowhere can I see 1.3 bpc and that result would be higher than SotA two or three years ago.
Thanks for keeping an eye out though!
https://arxiv.org/abs/1905.07799
On Sat, Jan 25, 2020 at 12:20 AM djstrong notifications@github.com wrote:
In the readme Adaptive Span Transformer is referred as small, but actually it is normal version. Small version achieves 1.3 BPC (BPB actually).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Smerity/sha-rnn/issues/11?email_source=notifications&email_token=AAAH4RPFW5CMVW3LCDHZUFLQ7PY47A5CNFSM4KLPUSR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IIVQW2Q, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAH4RLHQUA2M2MDN2ZYZ2LQ7PY47ANCNFSM4KLPUSRQ .
--
Regards, Stephen Merity
e: smerity@smerity.com w: smerity.com linkedin: http://au.linkedin.com/in/smerity
You are right, in the paper they present small and large models.
However, I was looking at their repo (https://github.com/facebookresearch/adaptive-span):
Scripts for running experiments in the paper are located in ./experiments/ directory. For example, a smaller 8-layer version of our model can be trained on a single GPU by running:
bash experiments/enwiki8_small.sh
It should reach about 1.3bpc on dev after 150k steps....
Experiment #params dev (bpc) test (bpc) enwik8 38M 1.04 1.02 enwik8_large 209M 1.00 0.98
In the experiments directory are scripts: enwik8_small.sh
, enwik8.sh
, enwik8_large.sh
. So, "small" from the paper is enwik8.sh
. That confused me.
In the readme Adaptive Span Transformer is referred as small, but actually it is normal version. Small version achieves 1.3 BPC (BPB actually).