bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

hello, I meet a problem #386

Open etoilestar opened 1 year ago

etoilestar commented 1 year ago

hello, when I run script to train gpt model,I meet an assertion error:Not sure how to proceed, we were given deepspeed configs in the deepspeed arguments and deepspeed. the script I used is https://github.com/bigscience-workshop/Megatron-DeepSpeed#deepspeed-pp-and-zero-dp. can you tell me why?

tjruwase commented 1 year ago

Can you please share the assertion message and stack trace?

tjruwase commented 1 year ago

Please try https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/run_bf16.sh or the equivalent run_fp16.sh

etoilestar commented 1 year ago

ok, I will have a try. on the other hand, I cannot find BF16Optimizer mentioned at https://huggingface.co/blog/zh/bloom-megatron-deepspeed#bf16optimizer, could you give me some tips?

tjruwase commented 1 year ago

https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/bf16_optimizer.py

hymie122 commented 1 year ago

I met the same problem when I was following the "start_fast.md".I want to know how to solve the question,Thank you!

AoZhang commented 1 year ago

comment line 429 args=args in megatron/training.py will solve this problem.

model, optimizer, _, lr_scheduler = deepspeed.initialize(
    model=model[0],
    optimizer=optimizer,
    lr_scheduler=lr_scheduler,
    config=config,
    #args=args,
)
murphypei commented 1 year ago

deepspeed.initialize can't be given both config and args.deepspeed_config, you should remove one of them.

divisionblur commented 3 months ago

comment line 429 args=args in megatron/training.py will solve this problem.

model, optimizer, _, lr_scheduler = deepspeed.initialize(
    model=model[0],
    optimizer=optimizer,
    lr_scheduler=lr_scheduler,
    config=config,
    #args=args,
)

jesus!!!!!!