Error when getting summary of T5 model: Expected tensor for argument #1 'indices' to have one of the following scalar types

anentropic commented 1 year ago

Hi,

I am trying to summarise a model from HuggingFace Hub

from transformers import T5ForConditionalGeneration, T5Tokenizer, T5Config

config = T5Config.from_pretrained('t5-large')
input_shape = (1, config.max_length)

model = T5ForConditionalGeneration.from_pretrained('t5-large')

summary = torchinfo.summary(model, input_shape, device="cpu")

I get this error:

site-packages/transformers/models/t5/modeling_t5.py:973, in T5Stack.forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, inputs_embeds, head_mask, cross_attn_head_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    972     assert self.embed_tokens is not None, "You have to initialize the model with valid token embeddings"
--> 973     inputs_embeds = self.embed_tokens(input_ids)
    975 batch_size, seq_length = input_shape

site-packages/torch/nn/modules/module.py:1538, in Module._call_impl(self, *args, **kwargs)
   1536     args = bw_hook.setup_input_hook(args)
-> 1538 result = forward_call(*args, **kwargs)
   1539 if _global_forward_hooks or self._forward_hooks:

site-packages/torch/nn/modules/sparse.py:162, in Embedding.forward(self, input)
    161 def forward(self, input: Tensor) -> Tensor:
--> 162     return F.embedding(
    163         input, self.weight, self.padding_idx, self.max_norm,
    164         self.norm_type, self.scale_grad_by_freq, self.sparse)

site-packages/torch/nn/functional.py:2210, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2209     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[81], line 1
----> 1 summary = torchinfo.summary(model, input_shape, device="cpu")

site-packages/torchinfo/torchinfo.py:218, in summary(model, input_size, input_data, batch_dim, cache_forward_pass, col_names, col_width, depth, device, dtypes, mode, row_settings, verbose, **kwargs)
    211 validate_user_params(
    212     input_data, input_size, columns, col_width, device, dtypes, verbose
    213 )
    215 x, correct_input_size = process_input(
    216     input_data, input_size, batch_dim, device, dtypes
    217 )
--> 218 summary_list = forward_pass(
    219     model, x, batch_dim, cache_forward_pass, device, model_mode, **kwargs
    220 )
    221 formatting = FormattingOptions(depth, verbose, columns, col_width, rows)
    222 results = ModelStatistics(
    223     summary_list, correct_input_size, get_total_memory_used(x), formatting
    224 )

Is it just that some types of model are unsupported? Or I'm doing something wrong?

Thanks :)

snimu commented 1 year ago

The relevant part of that error message is this:

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)

When you give input_shape to summary, it produces a Tensor of type float (I think torch.float64, but I'm not sure). If your model needs inputs with another data-type, try using input_data instead.

For example:

import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer, T5Config

config = T5Config.from_pretrained('t5-large')
input_data = torch.ones(1, config.max_length, dtype=torch.int)  # Actively define inputs, instead of just their shapes

model = T5ForConditionalGeneration.from_pretrained('t5-large')

summary = torchinfo.summary(model, input_data=input_data, device="cpu")

Try this (you might have to change the datype to torch.int8 or something, if this does not work) and if it works, report back and close the issue :)

anentropic commented 1 year ago

Thanks for your help!

I get further this time:

site-packages/transformers/models/t5/modeling_t5.py:969, in T5Stack.forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, inputs_embeds, head_mask, cross_attn_head_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    968     err_msg_prefix = "decoder_" if self.is_decoder else ""
--> 969     raise ValueError(f"You have to specify either {err_msg_prefix}input_ids or {err_msg_prefix}inputs_embeds")
    971 if inputs_embeds is None:

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[9], line 1
----> 1 summary = torchinfo.summary(model, input_data=input_data, device="cpu")

...
RuntimeError: Failed to run torchinfo. See above stack traces for more details. Executed layers up to: [T5Stack: 1, Embedding: 2, Dropout: 2, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Embedding: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5LayerNorm: 2, Dropout: 2]

anentropic commented 1 year ago

I think I need a more fleshed out input_data
https://stackoverflow.com/questions/65140400/valueerror-you-have-to-specify-either-decoder-input-ids-or-decoder-inputs-embed

something like (input_ids, attention_mask, decoder_input_ids )

I guess our current shape is just the input_ids part?

Or it's related to T5 being a seq2seq model https://stackoverflow.com/a/66117248/202168

anentropic commented 1 year ago

This worked:

summary = torchinfo.summary(model, input_data=(input_data, input_data, input_data), device="cpu")

thanks again for your help!

TylerYep / torchinfo

Error when getting summary of T5 model: Expected tensor for argument #1 'indices' to have one of the following scalar types #244