Open zhaowenZhou opened 1 year ago
Has anyone tried torch.cuda.amp? Seems that ms_attention doesn't support fp16 even after I modified ms_deform_attn_forward_cuda Any other way to implement amp? Or is there any ways to reduce the GPU memory? I got cuda OOM for bs=4 every time
Has anyone tried torch.cuda.amp? Seems that ms_attention doesn't support fp16 even after I modified ms_deform_attn_forward_cuda Any other way to implement amp? Or is there any ways to reduce the GPU memory? I got cuda OOM for bs=4 every time