RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 for batch size=16

An error occuring during inference on output = model.generate(inputs, max_new_tokens=512, temperature=0.1) which results in in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, model_kwargs) 3041 probs = nn.functional.softmax(next_token_scores, dim=-1) 3042 # TODO (joao): this OP throws "skipping cudagraphs due to ['incompatible ops']", find solution -> 3043 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) 3044 else: 3045 next_tokens = torch.argmax(next_token_scores, dim=-1)

RuntimeError: probability tensor contains either inf, nan or element < 0

2U1 / Llama3.2-Vision-Finetune

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 for batch size=16 #7