code for pret-training and fine-tuning the transformer model

Is there any way the code for pre-training and finetuning the transformer model could be shared? Or could anyone suggest to me where to look for resources to write scripts for replicating the training section of this paper? I am interested in adopting the paper's algorithm and training the language model on my own dataset. Thanks a lot.

google-deepmind / alphageometry

code for pret-training and fine-tuning the transformer model #91