Open cassanof opened 1 week ago
@cassanof Do you have a script that replicates this error? I'm not able to reproduce it with the same recipe. If not, could you give a more detailed stack trace with the argument types to tex.fused_amax_and_scale_update_after_reduction
?
Hi! unfortunately i cannot share, and wasn't able to repro with some of the open models. The arguments are a long list of different tensors.
At the end, i was able to get amax scaling to work by completely disabling the fused kernel in your code and using the non-fused instead. This is obviously undesired though.
Currently getting the following error on a simple forward with a transformer model when using DelayedScaling:
The recipe is quite simple:
te_recipe.DelayedScaling(te_recipe.Format.HYBRID, amax_history_len=64, amax_compute_algo="max")
. If I omit the recipe from the autocast context the forward works as expected.Any ideas?