Closed Ant0082 closed 1 year ago
The process is killed by the operation system. Maybe your CPU memory (not GPU memory) is not enough to accommodate the optimizer state when enabling CPU offload.
The process is killed by the operation system. Maybe your CPU memory (not GPU memory) is not enough to accommodate the optimizer state when enabling CPU offload.
It works with stage 3, but the following error is reported:
size mismatch for transformer.layers.44.post_attention_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.44.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.44.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.44.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.44.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.input_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.input_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.attention.query_key_value.bias: copying a param with shape torch.Size([12288]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.attention.dense.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.attention.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.post_attention_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.post_attention_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.45.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.input_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.input_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.attention.query_key_value.bias: copying a param with shape torch.Size([12288]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.attention.dense.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.attention.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.post_attention_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.post_attention_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.46.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.input_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.input_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.attention.query_key_value.bias: copying a param with shape torch.Size([12288]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.attention.dense.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.attention.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.post_attention_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.post_attention_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.layers.47.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.final_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
size mismatch for transformer.final_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([0]).
[2023-01-05 11:31:11,109] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 36256
[2023-01-05 11:31:15,802] [ERROR] [launch.py:292:sigkill_handler] ['/data/qin/miniconda3/envs/bmb_env/bin/python', '-u', 'finetune_glm.py', '--local_rank=3', '--deepspeed', '--deepspeed_config', 'config_tasks/config_blocklm_10B_cnndm.json', '--finetune', '--experiment-name', 'GLM-10B-chinese-customization_01-05-11-18', '--task', 'customization', '--data-dir', 'data/customization', '--save', 'ckpt/debug_/finetune_checkpoints', '--checkpoint-activations', '--num-workers', '1', '--no-load-lr-scheduler', '--block-lm', '--cloze-eval', '--task-mask', '--num-layers', '48', '--hidden-size', '4096', '--num-attention-heads', '64', '--max-position-embeddings', '1024', '--tokenizer-type', 'ChineseSPTokenizer', '--load-pretrained', '/data/qst/code/GLM/ckpt/glm-10b-chinese', '--epochs', '10', '--lr', '1e-5', '--lr-decay-style', 'linear', '--warmup', '0.06', '--label-smoothing', '0.1', '--save-interval', '10000', '--log-interval', '50', '--eval-interval', '1000', '--eval-iters', '100', '--eval-epoch', '2', '--src-seq-length', '512', '--tgt-seq-length', '128', '--min-tgt-length', '55', '--length-penalty', '0.7', '--no-repeat-ngram-size', '3', '--num-beams', '5', '--select-topk', '--eval-batch-size', '1', '--fp16', '--model-parallel-size', '1', '--overwrite'] exits with return code = 1
the shape in current model is torch.Size([0]).
This is because the model weights are partitioned across multiple GPUs with stage 3 and the state_dict
contains just the placeholders. I cannot find any instructions about how to load checkpoints saved without stage 3. Maybe you can ask the question in the DeepSpeed repo
I got the same question, @Ant0082 May I ask how do you solve this question?
I got the same question, @Ant0082 May I ask how do you solve this question?
Run the code with stage 2 config. And https://item.jd.com/100038704859.html
oh, thanks...
bash scripts/ds_finetune_seq2seq.sh config_tasks/model_blocklm_10B_chinese.sh config_tasks/seq_customization.sh