amazon-science / mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
https://arxiv.org/abs/2302.00923
Apache License 2.0
3.8k stars 314 forks source link

Demo usage #24

Open RamAnanth opened 1 year ago

RamAnanth commented 1 year ago

Thank you for the great work on Multimodal Chain of Thought and for open-sourcing the code! The results are really impressive. I was wondering if there is any colab notebook or example script to try this work on demo images/text

rraulison commented 1 year ago

Please, Share it on colab !

0xZXDX commented 1 year ago

mark

mkygogo commented 1 year ago

Is this can build a chatbot?

Movelocity commented 1 year ago

Here it is: Open In Colab

antonioRVR commented 1 year ago

Thank you for the demo in colab. Unfortunally I got an error image

roapple10 commented 1 year ago

Thank you for the demo in colab. Unfortunally I got an error image

Try this

from model import T5ForMultimodalGeneration from transformers import T5Tokenizer patch_size = (100, 256) # for DETR style save_dir = "./models/MM-CoT-UnifiedQA-base-Rationale" tokenizer = T5Tokenizer.from_pretrained("./models/MM-CoT-UnifiedQA-base-Rationale") padding_idx = tokenizer._convert_token_to_id(tokenizer.pad_token) model = T5ForMultimodalGeneration.from_pretrained( save_dir, patch_size=patch_size, padding_idx=padding_idx, save_dir=save_dir).cuda()****

AIAnytime commented 1 year ago

Hi, I tried the colab notebook but I am getting this below error: TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType

Below is the full error: `--------------------------------------------------------------------------- TypeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_8332\3201768157.py in ----> 1 outputs = model.generate(input_ids, max_length=512) # reads the vision feature if file detacted 2 show_result(outputs) 3 #outputs

~\anaconda3\lib\site-packages\torch\autograd\grad_mode.py in decorate_context(*args, kwargs) 25 def decorate_context(*args, *kwargs): 26 with self.clone(): ---> 27 return func(args, kwargs) 28 return cast(F, decorate_context) 29

~\anaconda3\lib\site-packages\transformers\generation\utils.py in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, **kwargs) 1389 1390 # 11. run greedy search -> 1391 return self.greedy_search( 1392 input_ids, 1393 logits_processor=logits_processor,

~\anaconda3\lib\site-packages\transformers\generation\utils.py in greedy_search(self, input_ids, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, model_kwargs) 2177 2178 # forward pass to get next token -> 2179 outputs = self( 2180 model_inputs, 2181 return_dict=True,

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []

~\Desktop\My Projects\mm-cot\model.py in forward(self, input_ids, image_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict) 116 hidden_states = encoder_outputs[0] 117 --> 118 image_embedding = self.image_dense(image_ids) 119 imageatt, = self.mha_layer(hidden_states, image_embedding, image_embedding) 120

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []

~\anaconda3\lib\site-packages\torch\nn\modules\linear.py in forward(self, input) 112 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias) 115 116 def extra_repr(self) -> str:

TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType`

Is the issue with Vision features? Can anyone help me debug this?

WeixuanXiong commented 1 year ago

Hi, I tried the colab notebook but I am getting this below error: TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType

Below is the full error: `--------------------------------------------------------------------------- TypeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_8332\3201768157.py in ----> 1 outputs = model.generate(input_ids, max_length=512) # reads the vision feature if file detacted 2 show_result(outputs) 3 #outputs

~\anaconda3\lib\site-packages\torch\autograd\grad_mode.py in decorate_context(*args, kwargs) 25 def decorate_context(*args, *kwargs): 26 with self.clone(): ---> 27 return func(args, kwargs) 28 return cast(F, decorate_context) 29

~\anaconda3\lib\site-packages\transformers\generation\utils.py in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, **kwargs) 1389 1390 # 11. run greedy search -> 1391 return self.greedy_search( 1392 input_ids, 1393 logits_processor=logits_processor,

~\anaconda3\lib\site-packages\transformers\generation\utils.py in greedy_search(self, input_ids, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, model_kwargs) 2177 2178 # forward pass to get next token -> 2179 outputs = self( 2180 model_inputs, 2181 return_dict=True,

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []

~\Desktop\My Projects\mm-cot\model.py in forward(self, input_ids, image_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict) 116 hidden_states = encoder_outputs[0] 117 --> 118 image_embedding = self.image_dense(image_ids) 119 imageatt, = self.mha_layer(hidden_states, image_embedding, image_embedding) 120

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []

~\anaconda3\lib\site-packages\torch\nn\modules\linear.py in forward(self, input) 112 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias) 115 116 def extra_repr(self) -> str:

TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType`

Is the issue with Vision features? Can anyone help me debug this?

Have you solved this issue? I meet the same bug as well. It seems that vision feature has not been passed to model.

tahsintahsin commented 1 year ago

Have the same error TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType Any updates on the issue ?

SSOODDAA commented 5 months ago

Hi, I tried the colab notebook but I am getting this below error: TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType

Below is the full error: `--------------------------------------------------------------------------- TypeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_8332\3201768157.py in ----> 1 outputs = model.generate(input_ids, max_length=512) # reads the vision feature if file detacted 2 show_result(outputs) 3 #outputs

~\anaconda3\lib\site-packages\torch\autograd\grad_mode.py in decorate_context(*args, kwargs) 25 def decorate_context(*args, *kwargs): 26 with self.clone(): ---> 27 return func(args, kwargs) 28 return cast(F, decorate_context) 29

~\anaconda3\lib\site-packages\transformers\generation\utils.py in generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, **kwargs) 1389 1390 # 11. run greedy search -> 1391 return self.greedy_search( 1392 input_ids, 1393 logits_processor=logits_processor,

~\anaconda3\lib\site-packages\transformers\generation\utils.py in greedy_search(self, input_ids, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, model_kwargs) 2177 2178 # forward pass to get next token -> 2179 outputs = self( 2180 model_inputs, 2181 return_dict=True,

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []

~\Desktop\My Projects\mm-cot\model.py in forward(self, input_ids, image_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict) 116 hidden_states = encoder_outputs[0] 117 --> 118 image_embedding = self.image_dense(image_ids) 119 imageatt, = self.mha_layer(hidden_states, image_embedding, image_embedding) 120

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, *kwargs) 1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1193 or _global_forward_hooks or _global_forward_pre_hooks): -> 1194 return forward_call(input, **kwargs) 1195 # Do not call functions when jit is used 1196 full_backward_hooks, non_full_backward_hooks = [], []

~\anaconda3\lib\site-packages\torch\nn\modules\linear.py in forward(self, input) 112 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias) 115 116 def extra_repr(self) -> str:

TypeError: linear(): argument 'input' (position 1) must be Tensor, not NoneType`

Is the issue with Vision features? Can anyone help me debug this?

I have the same question, have you solved it?