Closed phucdoitoan closed 7 months ago
Hey, As mentioned in the README you can actually use T5 implementation from HF. T5 implementation in the repo has a few characteristics (advantages imo): 1) It is fully compatible with HF -> You can load any weights from the Hub; 2) It is slightly faster due to extra tensor casts I added; 3) It is shorter - original T5 file has more than 1k lines which can be hard to digest for beginner ML practitioners. Hope this clarifies :)
Hi, thanks a lot for your reply. I also realize that the HF model requires more GPU memory when being trained in bf16 than your implementation. e.g GPU memory error if batch_size=128 even with bf16.
That's true - this is due to some extra dtype casts I added in my implementation
Hi, thank you a lot for this helpful github.
Can I ask why do you need to re-implement T5 model instead of using the one from huggingface and pretraining the huggingface model with mixed precision directly?