reproduce arxiv classification task

google-research / bigbird

Transformers for Longer Sequences

Apache License 2.0

563 stars 101 forks source link

We try to reproduce arxiv task with f1 92 as shown in the paper, we are using default hyperparameters defined in bigbird/classifier/base_size.sh, pretrained checkpoint here, but with batch size = 2 due to memory limitation (total batch size = 8gpu 2 = 16), after 16k steps (16000 16 / 30034 = 8.5 epoch), but only get f1 84 in the end, which is too low compare to the paper which is trained by 10 epochs. Did we missing something? preprocessing of Arxiv? or just because of the batch size is too small? Will you release the checkpoint of Arxiv in the future?

About the difference of dataset, we have finetune roberta on the same arxiv dataset and get f1 86, pretty close the the paper.

google-research / bigbird

reproduce arxiv classification task #20