PiotrNawrot / nanoT5

Fast & Simple repository for pre-training and fine-tuning T5-style models
Apache License 2.0
970 stars 74 forks source link

Continued pretraining from official models. #36

Closed IdeaKing closed 6 months ago

IdeaKing commented 6 months ago

I would like to start off by saying that this is absolutely great work. I do have a minor question though; would it be possible to continue pretraining from original weights in HF? Or would we have to manually retrain C4 in conjunction with our personal dataset.

PiotrNawrot commented 6 months ago

Hey, I'm glad that you liked it. Yes, you can directly pre-train from original weights from HF. The only thing that you need to pay attention to is which implementation you will be working with. You can either choose HF implementation or the T5 implementation from nanoT5.

Here you define what kind of implementation you want to use. You can choose implementation from HF and then also load the pre-trained weights from HF directly and it would work. You could rely on rest of the nanoT5 code for continued pre-training.

If you want to use my code that implements T5 you need to make sure that it's compatible with T5's variant that you're loading. My code is working with the default T5 for sure (so you can load T5-base weights from HF for example and then continue training or perform inference with my implementation), but I am not 100% sure about other T5 variants as new models are coming every day : ).