Input tensor at index 2 has invalid shape [2, 2, 12, 1024, 64], but expected [2, 3, 12, 1024, 64]

我用三块卡训练得时候会出现这个错，然后我去查了一圈，发现有一个四块卡报RuntimeError: Input tensor at index 3 has invalid shape [2, 2, 16, 128, 64] but expected [2, 4, 16, 128, 64]的，然后我就又改回了四块卡训练，然后就很奇怪的跑通了。。但是不知道为什么。。 args: Namespace(batch_size=8, device='5,6,1,4', epochs=5, fp16=False, fp16_opt_level='O1', gradient_accumulation=1, log_step=1, lr=0.00015, max_grad_norm=1.0, model_config='config/model_config_small.json', num_pieces=100, output_dir='model/', pretrained_model='', raw=False, raw_data_path='data/data/doupo/train.json', segment=False, stride=768, tokenized_data_path='data/tokenized/', tokenizer_path='cache/vocab_small.txt', warmup_steps=2000) config: { "attn_pdrop": 0.1, "embd_pdrop": 0.1, "finetuning_task": null, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_layer": 10, "n_positions": 1024, "num_labels": 1, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pruned_heads": {}, "resid_pdrop": 0.1, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "torchscript": false, "use_bfloat16": false, "vocab_size": 13317 }

using device: cuda calculating total steps 100%|████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 92.82it/s] total steps = 3914 Let's use 4 GPUs! starting training epoch 1 time: 2023-01-13 11:48:51.538218 /u01/zourui/anaconda3/envs/GPT/lib/python3.8/site-packages/torch/nn/parallel/functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' /u01/zourui/anaconda3/envs/GPT/lib/python3.8/site-packages/transformers/optimization.py:166: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.) expavg.mul(beta1).add_(1.0 - beta1, grad) now time: 11:49. Step 1 of piece 0 of epoch 1, loss 9.667740821838379 now time: 11:49. Step 2 of piece 0 of epoch 1, loss 9.682665824890137 now time: 11:49. Step 3 of piece 0 of epoch 1, loss 9.685418128967285 now time: 11:49. Step 4 of piece 0 of epoch 1, loss 9.6702299118042 now time: 11:49. Step 5 of piece 0 of epoch 1, loss 9.668827056884766 now time: 11:49. Step 6 of piece 0 of epoch 1, loss 9.66973876953125 now time: 11:49. Step 7 of piece 0 of epoch 1, loss 9.65914535522461

Morizeyao / GPT2-Chinese

Input tensor at index 2 has invalid shape [2, 2, 12, 1024, 64], but expected [2, 3, 12, 1024, 64] #264