CUDA out of memory occurs when the training length reaches 450k on a100 (80G).I used the huggingface version, hyenadna-medium-450k-seqlen-hf. I'm trying the species classification task.Is the version on huggingface optimized? I don't seem to see flash-attn used.
CUDA out of memory occurs when the training length reaches 450k on a100 (80G).I used the huggingface version, hyenadna-medium-450k-seqlen-hf. I'm trying the species classification task.Is the version on huggingface optimized? I don't seem to see flash-attn used.