Open rikabi89 opened 2 hours ago
Any idea why this is happening? In this case it 1.5 hour dataset.
appears to be related to FP16? I switched it to "none" and everything was fine.
@rikabi89 Sure, stuffs relative to gradient explosion use bf16 or none in this case
Any idea why this is happening? In this case it 1.5 hour dataset.