Open chavinlo opened 1 year ago
*I don't know much about lm training, so excuse me if I am missing something obvious
I think it is because of the delay pattern of the encodec codebook pattern structure.
in compute_predictions
, there's pattern.revert_pattern_logits
part in it
# note: we use nans as special token to make it obvious if we feed unexpected logits
logits, logits_indexes, logits_mask = pattern.revert_pattern_logits(
logits, float('nan'), keep_only_valid_steps=True
)
I think it is because you did not use automatic mixed precision for training, so that it returns all NaNs after the first backward pass.
I am trying in the followng way:
This is a minified version of my code, but should replicate the exact problem.
logits come out with size:
torch.Size([1, 4, 1500, 2048])
At first, most of the values are "normal":
In this tensor for example, there are 12,288 nans.
I am trying to train, so I replace those nans with 0s. But after a backward pass, it starts returning everything with NaN.
Before this, I have also tried another approach by just passing the
ConditioningAttributes
rather than the pre-processed tensors tocompute_predictions
. I passed them on conditions. However, this will always return NaNs.