Closed anentropic closed 1 year ago
The relevant part of that error message is this:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
When you give input_shape
to summary
, it produces a Tensor
of type float
(I think torch.float64
, but I'm not sure). If your model needs inputs with another data-type, try using input_data
instead.
For example:
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer, T5Config
config = T5Config.from_pretrained('t5-large')
input_data = torch.ones(1, config.max_length, dtype=torch.int) # Actively define inputs, instead of just their shapes
model = T5ForConditionalGeneration.from_pretrained('t5-large')
summary = torchinfo.summary(model, input_data=input_data, device="cpu")
Try this (you might have to change the datype to torch.int8
or something, if this does not work) and if it works, report back and close the issue :)
Thanks for your help!
I get further this time:
site-packages/transformers/models/t5/modeling_t5.py:969, in T5Stack.forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, inputs_embeds, head_mask, cross_attn_head_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
968 err_msg_prefix = "decoder_" if self.is_decoder else ""
--> 969 raise ValueError(f"You have to specify either {err_msg_prefix}input_ids or {err_msg_prefix}inputs_embeds")
971 if inputs_embeds is None:
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 summary = torchinfo.summary(model, input_data=input_data, device="cpu")
...
RuntimeError: Failed to run torchinfo. See above stack traces for more details. Executed layers up to: [T5Stack: 1, Embedding: 2, Dropout: 2, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Embedding: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5LayerNorm: 2, Dropout: 2]
I think I need a more fleshed out input_data
https://stackoverflow.com/questions/65140400/valueerror-you-have-to-specify-either-decoder-input-ids-or-decoder-inputs-embed
something like (input_ids, attention_mask, decoder_input_ids )
I guess our current shape is just the input_ids
part?
Or it's related to T5 being a seq2seq model https://stackoverflow.com/a/66117248/202168
This worked:
summary = torchinfo.summary(model, input_data=(input_data, input_data, input_data), device="cpu")
thanks again for your help!
Hi,
I am trying to summarise a model from HuggingFace Hub
I get this error:
Is it just that some types of model are unsupported? Or I'm doing something wrong?
Thanks :)