EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://www.eleuther.ai/
Apache License 2.0
6.89k stars 1k forks source link

RuntimeError: The expanded size of the tensor (1) must match the existing size (10) at non-singleton dimension 2 #870

Closed crazyofapple closed 1 year ago

crazyofapple commented 1 year ago

Describe the bug RuntimeError: The expanded size of the tensor (1) must match the existing size (10) at non-singleton dimension 2. Target sizes: [1, 4, 1, 10]. Tensor sizes: [1, 1, 10, 10] File "generate.py", line 59, in main generate_samples_input_from_file( File "/share/home/gpt-neox/megatron/text_generation_utils.py", line 620, in generate_samples_input_from_file generated_texts = generate_samples_from_prompt( File "/share/home/gpt-neox/megatron/text_generation_utils.py", line 485, in generate_samples_from_prompt for ( File "/share/home/gpt-neox/megatron/text_generation_utils.py", line 316, in stream_tokens logits = forward_model(model, model_inputs, neox_args.is_pipe_parallel) File "/share/home/gpt-neox/megatron/text_generation_utils.py", line 137, in forward_model return model.module(model_inputs) File "/share/home/.conda/envs/gpt-neox/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/share/home/ldf/gpt-neox/megatron/model/utils.py", line 168, in forward x = func(forward_input) File "/share/home/ldf/gpt-neox/megatron/model/utils.py", line 161, in exec_func inputs = layer(inputs) File "/share/home/.conda/envs/gpt-neox/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/share/home/ldf/gpt-neox/megatron/model/transformer.py", line 807, in forward return super().forward(hidden_states, attention_mask), attention_mask File "/share/home/ldf/gpt-neox/megatron/model/transformer.py", line 769, in forward attention_output, attention_bias = self.attention( File "/share/home/.conda/envs/gpt-neox/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/share/home/ldf/gpt-neox/megatron/model/transformer.py", line 609, in forward context_layer = self.attention( File "/share/home/ldf/gpt-neox/megatron/model/transformer.py", line 391, in attention attention_probs = self.scale_mask_softmax(attention_scores, attention_mask) File "/share/home/.conda/envs/gpt-neox/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/share/home/ldf/gpt-neox/megatron/model/fused_softmax.py", line 146, in forward return self.forward_torch_softmax(input, mask) File "/share/home/ldf/gpt-neox/megatron/model/fused_softmax.py", line 190, in forward_torch_softmax mask_output = self.mask_func(input, mask) if mask is not None else input File "/share/home/ldf/gpt-neox/megatron/model/gpt2_model.py", line 48, in gpt2_attention_mask_func attention_scores.maskedfill(ltor_mask, -10000.0) RuntimeError: The expanded size of the tensor (1) must match the existing size (10) at non-singleton dimension 2. Target sizes: [1, 4, 1, 10]. Tensor sizes: [1, 1, 10, 10] wandb: Waiting for W&B process to finish... (failed 1).

To Reproduce python deepy.py generate.py -d configs 6-7B.yml slurm_local.yml text_generation.yml

Environment (please complete the following information):

Stormcode1 commented 1 year ago

Were you able to figure out the issue? I've been running into same issue with different datasets and have been going nuts trying to figure it out.

TissueC commented 1 year ago

Same here. Appreciated it if any information provided

TissueC commented 1 year ago

I have checked it out. If inputting multiple (>1) lines in input_sample.txt and generating, the error will occur. What an absurd bug.

TissueC commented 1 year ago

undo this commit and this bug will be fixed https://github.com/EleutherAI/gpt-neox/commit/17b84d75c295d25e92807de1c9dad22fa9a79fd2