Closed sbmaruf closed 3 weeks ago
Hey @sbmaruf, this issue is due to this line using and
instead of ,
to separate the rng and fp8 contexts. It's been like this since the introduction of the fp8 context in June 2023. Since we don't use dropout in many models that we train internally we haven't studied the impact this has on training.
Thanks for reply.
Looking back at the recent release of mcore 0.9.0:
Do you know from which version this breaking change occurred? What are the effects of this issue during training?
@ko3n1g