Closed Sanqiang closed 3 years ago
I tried to reproduce the model with A100 (40GB) GPU, but it cannot fit T5-3B/11B without model parallel or Deepspeed? I am wondering how you fit the large model into TPU? are you using half precision (fp16) or something else?
I tried to reproduce the model with A100 (40GB) GPU, but it cannot fit T5-3B/11B without model parallel or Deepspeed? I am wondering how you fit the large model into TPU? are you using half precision (fp16) or something else?