I found your paper very interesting. However, when I run the code, I encounter the "CUDA out of memory" error, even with four 80G A100 GPUs. Do I need to implement mixed precision operations? Could you please provide your accelerate configuration? That would be very helpful. Thank you so much!
Hello,
I found your paper very interesting. However, when I run the code, I encounter the "CUDA out of memory" error, even with four 80G A100 GPUs. Do I need to implement mixed precision operations? Could you please provide your accelerate configuration? That would be very helpful. Thank you so much!