getting error when pretraining.

sazzad1779 commented 5 months ago

im getting this error for my training script, File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py", line 356, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (32768x4096 and 1x8388608)

here is my command ,

!torchrun --nnodes 1 --nproc_per_node 1 run_clm_with_peft.py \
    --deepspeed ds_zero2_no_offload.json \
    --model_name_or_path LLAMA_models \
    --tokenizer_name_or_path   tokenizer_model_path \
    --dataset_dir /content/small_chunk_data/business \
    --data_cache_dir llama2_pretrain/cache \
    --validation_split_percentage 0.1 \
    --per_device_train_batch_size 64 \
    --do_train \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 1 \
    --max_steps 200 \
    --lr_scheduler_type cosine \
    --learning_rate 2e-4 \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 1 \
    --save_steps 50 \
    --gradient_accumulation_steps 2 \
    --preprocessing_num_workers 8 \
    --block_size 512 \
    --output_dir llama2_pretrain/result \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank 64 \
    --lora_alpha 128 \
    --trainable q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj \
    --lora_dropout 0.05 \
    --modules_to_save  embed_tokens,lm_head  \
    --torch_dtype float16 \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False \
    --flash_attn True \
    --load_in_kbits 4 \
    # --resume_from_checkpoint ${output_dir}/checkpoint-300

and here is my requirement list

accelerate==0.25.0 \
aiohttp==3.9.1 \
aiosignal==1.3.1 \
annotated-types==0.6.0 \
appdirs==1.4.4 \
async-timeout==4.0.3 \
attrs==23.2.0 \
bitsandbytes==0.41.1 \
certifi==2023.11.17 \
charset-normalizer==3.3.2 \
click==8.1.7 \
datasets==2.16.1 \
deepspeed==0.12.6 \
dill==0.3.7 \
docker-pycreds==0.4.0 \
docstring-parser==0.15 \
einops==0.7.0 \
flash-attn==2.4.2 \
frozenlist==1.4.1 \
fsspec==2023.10.0 \
gitdb==4.0.11 \
gitpython==3.1.40 \
hjson==3.1.0 \
huggingface-hub==0.20.2 \
idna==3.6 \
joblib==1.3.2 \
markdown-it-py==3.0.0 \
mdurl==0.1.2 \
multidict==6.0.4 \
multiprocess==0.70.15 \
ninja==1.11.1.1 \
numpy==1.26.3 \
packaging==23.2 \
pandas==2.1.4 \
protobuf==4.25.1 \
psutil==5.9.7 \
py-cpuinfo==9.0.0 \
pyarrow==14.0.2 \
pyarrow-hotfix==0.6 \
pydantic==2.5.3 \
pydantic-core==2.14.6 \
pygments==2.17.2 \
pynvml==11.5.0 \
python-dateutil==2.8.2 \
pytz==2023.3.post1 \
pyyaml==6.0.1 \
regex==2023.12.25 \
requests==2.31.0 \
rich==13.7.0 \
safetensors==0.4.1 \
scikit-learn==1.3.2 \
scipy==1.11.4 \
sentencepiece==0.1.99 \
sentry-sdk==1.39.1 \
setproctitle==1.3.3 \
shtab==1.6.5 \
six==1.16.0 \
smmap==5.0.1 \
threadpoolctl==3.2.0 \
tokenizers==0.15.0 \
tqdm==4.66.1 \
trl==0.7.8 \
tyro==0.6.3 \
tzdata==2023.4 \
urllib3==2.1.0 \
wandb==0.16.1 \
xxhash==3.4.1 \
yarl==1.9.4 

!pip install git+https://github.com/abhinand5/transformers.git@abhinand5-deepspeed-patch
!pip install git+https://github.com/huggingface/peft.git@13e53fc

How to solve this.

abhinand5 commented 5 months ago

Hi @sazzad1779 just want a few more details.

Which model are you using? Make sure the model directory contains the files from one of the Llama 2 models - example (except safetensors).
What tokenizer are you using? The tokenizer must be merged with the original Llama tokenizer and it should be in hf format, can't directly use the SentencePiece model.
This shape -> 1x8388608 is very weird. Are you making any changes to the original model?

sazzad1779 commented 5 months ago

@abhinand5

i took Llama 2 hf sharded model (not safetensors).
i increase vocab size 32k to 41k by adding 10k of new token. for expanding token i used SentencePiece library.
didn't anything else. actually im trying to do continual pretraining. im facing catastrophic forgetting problem.

abhinand5 / tamil-llama

getting error when pretraining. #9