load_huggingface_model failed on rugpt3large_based_on_gpt2 ```RuntimeError: The size of tensor a (50264) must match...```

IvanAntipov commented 3 years ago

I try to reproduce finetuning process for rugpt3large with deepspeed and apex.

I managed to finetune rugpt3small.

But when a run the same script with large configuration a get the following error

R0/1: Loaded 49 examples, 100352 tokens
> padded vocab (size: 50257) with 7 dummy tokens (new size: 50264)
> end-of-document token: 0
building GPT3 model ...
Load huggingface model from sberbank-ai/rugpt3large_based_on_gpt2
Downloading: 100%|██████████| 609/609 [00:00<00:00, 636kB/s]
Downloading: 100%|██████████| 3.14G/3.14G [01:02<00:00, 50.0MB/s]
Traceback (most recent call last):
  File "ru-gpts/pretrain_gpt3.py", line 830, in <module>
    main()
  File "ru-gpts/pretrain_gpt3.py", line 786, in main
    model, optimizer, lr_scheduler = setup_model_and_optimizer(args)
  File "ru-gpts/pretrain_gpt3.py", line 177, in setup_model_and_optimizer
    model = get_model(args)
  File "ru-gpts/pretrain_gpt3.py", line 78, in get_model
    model = load_huggingface_model(model, args.load_huggingface, args.huggingface_double_pos_embeddings)
  File "/notebooks/ru-gpts_apex/ru-gpts/src/utils.py", line 474, in load_huggingface_model
    move_weights(model2fill, h_model, double_pos_embeddings)
  File "/notebooks/ru-gpts_apex/ru-gpts/src/utils.py", line 454, in move_weights
    load_weights(transformer_model.wte, our.word_embeddings, dst2src)
  File "/notebooks/ru-gpts_apex/ru-gpts/src/utils.py", line 421, in load_weights
    load.copy_(data)
RuntimeError: The size of tensor a (50264) must match the size of tensor b (50257) at non-singleton dimension 0
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------

My configuration


MP_SIZE=1
# Change for multinode config
NUM_GPUS_PER_WORKER=1

gpt_options=" \
       --load-huggingface sberbank-ai/rugpt3large_based_on_gpt2 \
       --train-data-path "train.list" \
        --test-data-path "valid.list" \
       --logging-dir=log/ \
       --save model \
       --save-interval 1000 \
       --model-parallel-size ${MP_SIZE} \
       --num-layers 24 \
       --hidden-size 1536 \
       --num-attention-heads 16 \
       --batch-size 1 \
       --seq-length 2048 \
       --max-position-embeddings 2048 \
       --train-iters 200000 \
       --resume-dataloader \
       --distributed-backend nccl \
       --lr 0.00015 \
       --lr-decay-style cosine \
       --weight-decay 1e-2 \
       --warmup .01 \
       --log-interval 100 \
       --fp16 \
       --checkpoint-activations \
       --deepspeed-activation-checkpointing \
       --deepspeed \
       --deepspeed_config ru-gpts/src/deepspeed_config/gpt3_large_2048.json \
"

USE_DEEPSPEED=1 mpirun --allow-run-as-root --np ${NUM_GPUS_PER_WORKER} python ru-gpts/pretrain_gpt3.py $@ ${gpt_options}

I tried different transformers versions transformers==3.5.0, transformers==4.3.0, but result is the same

P.S. My apex installation slightly differs from one in Finetune_and_generate_RuGPTs_deepspeed_megatron.ipynb example, because I had to install it with Nidia container, in other case it didn't work.

Artyrm commented 3 years ago

Same here.

MolchanovArt commented 3 years ago

I've exactly the same with RuGPT-3 Medium. @IvanAntipov what configuration are you using (cuda, torch, triton)?

MolchanovArt commented 3 years ago

I added a new parameter

--make-vocab-size-divisible-by 1

And it works.

IvanAntipov commented 3 years ago

I've exactly the same with RuGPT-3 Medium. @IvanAntipov what configuration are you using (cuda, torch, triton)?

I think it is not actual anymore, but nevertheless:

CUDA 11.2
torch==1.5.0
No triton

Artyrm commented 3 years ago

@MolchanovArt Thank you kindly. I manged to pass it, finally.

But stumped upon that pesky RuntimeError: CUDA: Error- invalid ptx (https://github.com/sberbank-ai/ru-gpts/issues/62) in Colab, even with medium model. But I'm trying to use cpu_offload to fit large model in GPU RAM.

Just for the record i have

print(torch.__version__)
!/usr/local/cuda/bin/nvcc --version | grep cuda

torch: 1.7.0+cu110 Cuda: Build cuda_11.0_bu.TC445_37.28845127_0 triton: 0.2.3

king-menin commented 3 years ago

solved by https://github.com/sberbank-ai/ru-gpts/issues/69#issuecomment-891336114

ai-forever / ru-gpts

load_huggingface_model failed on rugpt3large_based_on_gpt2 ```RuntimeError: The size of tensor a (50264) must match...``` #69