Open fabiancpl opened 2 months ago
@fabiancpl sorry that the Megatron-LM example is not working for you, we'll look into it.
If you are interested in fine-tuning an existing checkpoint, TE has an integration with accelerate and we have an example for "bert-base-cased" here: https://github.com/huggingface/accelerate/tree/main/benchmarks/fp8.
Hi guys,
I am following the Megatron-LM example to pre-train a BERT model but I'm getting this error:
I'm using transformer_engine==1.8.0+3ec998e with megatron core_v0.7.0. My pre-training script is like this:
I'm also interested in using the BERT cased checkpoint instead of pre-training from scratch.
Thanks in advance.