Open NandhaKishorM opened 1 month ago
An error occuring during inference on output = model.generate(inputs, max_new_tokens=512, temperature=0.1) which results in in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, model_kwargs) 3041 probs = nn.functional.softmax(next_token_scores, dim=-1) 3042 # TODO (joao): this OP throws "skipping cudagraphs due to ['incompatible ops']", find solution -> 3043 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) 3044 else: 3045 next_tokens = torch.argmax(next_token_scores, dim=-1)
RuntimeError: probability tensor contains either inf, nan or element < 0
inf
nan
Looks like its a error from llama, since llama2.
https://github.com/meta-llama/llama/issues/380
You could checkout this link.
An error occuring during inference on output = model.generate(inputs, max_new_tokens=512, temperature=0.1) which results in in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, model_kwargs) 3041 probs = nn.functional.softmax(next_token_scores, dim=-1) 3042 # TODO (joao): this OP throws "skipping cudagraphs due to ['incompatible ops']", find solution -> 3043 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) 3044 else: 3045 next_tokens = torch.argmax(next_token_scores, dim=-1)
RuntimeError: probability tensor contains either
inf
,nan
or element < 0