How was ProtT5 trained?

No, unfortunately, I have no working version of the old code anymore, sorry. It was based on this old version of T5 training code that is depreciated by now: https://github.com/google-research/text-to-text-transfer-transformer/tree/main

However, for my ProstT5 fine-tuning (which involved continued span-based pre-training using ProtT5 as starting point), I successfully used this one: https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py Contrary to our original ProtT5 pre-training objective which was closer to BERT pre-training (corrupting always only spans of length=1 and reconstructing the full/unmasked sequence in the output instead of generating only those tokens that got replaced by spans), I simply followed the original T5 pre-training strategy given in the run_t5_mlm_flax.py linked above for continued pre-training of ProtT5 (which worked fine).

If you want to start training from scratch, maybe this repo helps (did not try, though): https://github.com/PiotrNawrot/nanoT5

agemagician / ProtTrans

How was ProtT5 trained? #148