Is this assertion for mask wrong?

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.32k stars 214 forks source link

Open yinfangchen opened 7 months ago

yinfangchen commented 7 months ago

I got an AssertionError: Mask is silently ignored due to the use of a custom kernel when training GPT-2 with examples/pretrain_gpt.sh.

Is this assertion necessary? And is it even correct?

LordEdison commented 5 months ago

same puzzlement