Closed NProkoptsev closed 4 years ago
https://github.com/pytorch/fairseq/tree/master/examples/xlmr xlm.base.tar.gz weights 2.4gb, while xlm.large.tar.gz weight 900mb It seems that base model has unnecessary optimizer state
Nice catch, I replaced it with the stripped version (without optimizer state).
https://github.com/pytorch/fairseq/tree/master/examples/xlmr xlm.base.tar.gz weights 2.4gb, while xlm.large.tar.gz weight 900mb It seems that base model has unnecessary optimizer state