Silly question: Why do you need to re-implement T5 model?

PiotrNawrot / nanoT5

Fast & Simple repository for pre-training and fine-tuning T5-style models

Apache License 2.0

970 stars 74 forks source link

Silly question: Why do you need to re-implement T5 model? #31

Closed phucdoitoan closed 7 months ago

phucdoitoan commented 7 months ago

Hi, thank you a lot for this helpful github.

Can I ask why do you need to re-implement T5 model instead of using the one from huggingface and pretraining the huggingface model with mixed precision directly?

PiotrNawrot commented 7 months ago

Hey, As mentioned in the README you can actually use T5 implementation from HF. T5 implementation in the repo has a few characteristics (advantages imo): 1) It is fully compatible with HF -> You can load any weights from the Hub; 2) It is slightly faster due to extra tensor casts I added; 3) It is shorter - original T5 file has more than 1k lines which can be hard to digest for beginner ML practitioners. Hope this clarifies :)

phucdoitoan commented 7 months ago

Hi, thanks a lot for your reply. I also realize that the HF model requires more GPU memory when being trained in bf16 than your implementation. e.g GPU memory error if batch_size=128 even with bf16.

PiotrNawrot commented 7 months ago

That's true - this is due to some extra dtype casts I added in my implementation