Is there any way the code for pre-training and finetuning the transformer model could be shared? Or could anyone suggest to me where to look for resources to write scripts for replicating the training section of this paper? I am interested in adopting the paper's algorithm and training the language model on my own dataset. Thanks a lot.
Is there any way the code for pre-training and finetuning the transformer model could be shared? Or could anyone suggest to me where to look for resources to write scripts for replicating the training section of this paper? I am interested in adopting the paper's algorithm and training the language model on my own dataset. Thanks a lot.