Open friesel opened 3 years ago
Hi, can you specify which project you're directing your question to?
Sorry, my question is to the Perceiver IO-project-team.
In the NLP-world often the pretrained models are just english or "all the worlds languages". Many users however need inference in non-english languages and have 1 or 2 GPUs rather than TPU-pods, so for them it's most efficient to pretrain only in the language you actually need inference in. So both for pretraining and finetuning it'd be great to have the scripts you used in your pretraining of the masked LM available.
Thx
Hi, thanks for your interest in Perceiver IO. We do not plan on open sourcing the training scripts for the masked LM, because the script is heavily tied to our internal infrastructure for training these models at scale. We do have an example training pipeline for ImageNet released as well as the exact configuration we used for language modeling from bytes (in the language modeling colab), which hopefully would be of use if you wish to train a new language model from scratch for other languages.
Do let us know if you have any further questions or if you encounter any issues trying to replicate our work!
Thx for the orientation. I will then get my head around the ImageNet-pipeline and try to adapt that to the NLP case.
Hi @fding Would it be possible to share some of the tensorboard logs for the Byte level LM pretraining and/or specifics on what the final MLM loss the models converge to(something similar to https://github.com/google-research/electra/issues/3)? I am trying to replicate the Byte level experiments, so these logs would be really useful as a reference. Thank you !
Do you intend to publish the training scrips for the masked LM as well?