Open csanadpoda opened 1 year ago
Hi @csanadpoda , yes, we used fp16 ( https://github.com/clovaai/donut/blob/master/train.py#L127 ). Hope this helps ;)
Hi @csanadpoda , yes, we used fp16 ( https://github.com/clovaai/donut/blob/master/train.py#L127 ). Hope this helps ;)
Yes I'm using the same, I guess the resolution increase is the issue with me (I'm training [1600, 1920]). That's 1228800 vs. 3072000 pixels to process, less than half, so it makes sense. Reducing the resolution to [1280, 960] allows me to use batch sizes of 2 (maybe even 3).
In the "Training" section, you mention you used a single A100 with the attached config yaml. An A100 has either 40 or 80GB of VRAM. The batch size is set to 8 in train_cord.yaml with a resolution of [1280, 960].
On a 24 GB 4090 with
torch.set_float32_matmul_precision('high')
and a resolution of around [1920, 1600] (if I remember correctly, but definitively a bit above [1280, 960]) a batch size of one already takes up 20+ GBs of VRAM, but according to this GitHub page, you managed to fit a batch size of 8 - 8 times my batch size on 4 times the VRAM.May I ask how this was done? Did you use lower precision? Or did the resolution make such a huge difference? Thank you!