NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.23k stars 2.08k forks source link

[QUESTION]when pretraining bert,meet bug:cuBLAS Error: the requested functionality is not supported #876

Open shanyuaa opened 1 week ago

shanyuaa commented 1 week ago

Your question Ask a clear and concise question about Megatron-LM.

my train_bert_340m_distributed.sh(specialize part) is as follows: CHECKPOINT_PATH=/proj/bert/checkpoints/null/ TENSORBOARD_LOGS_PATH=/proj/bert/logs VOCAB_FILE=/proj/bert/checkpoints/bert-large-uncased-vocab.txt DATA_PATH=/proj/bert/dataset/ag_news_text_sentence

the error is:

截屏2024-06-18 20 53 31