Open shamanez opened 10 months ago
Hi there. Sorry for the late response. I run the experiment with 4*A100 40GB and didn't encounter the OOM problem. Did you apply the flash-attention?
I also encountered this problem,I test on 4*A100 40GB, set num_nodes=1 num_gpu_per_node=4 bsz=16 is there any problems?
I am using : 8 * A100 40GB