epfLLM / Megatron-LLM

distributed trainer for LLMs
Other
504 stars 73 forks source link

run finetune llama2-7B error #77

Closed 13416157913 closed 9 months ago

13416157913 commented 9 months ago

help please error: output_tensor = forward_step(forward_step_func, data_iterator, File "/home/dengkaibiao/Megatron-LLM/megatron/schedules.py", line 117, in forward_step output_tensor, loss_func = forward_step_func(data_iterator, model) File "/home/dengkaibiao/Megatron-LLM/finetune.py", line 223, in forward_step batch = get_batch(data_iterator) File "/home/dengkaibiao/Megatron-LLM/finetune.py", line 101, in get_batch tokenizer = get_tokenizer() File "/home/dengkaibiao/Megatron-LLM/megatron/global_vars.py", line 45, in get_tokenizer _ensure_var_is_initialized(_GLOBAL_TOKENIZER, 'tokenizer') File "/home/dengkaibiao/Megatron-LLM/megatron/global_vars.py", line 198, in _ensure_var_is_initialized assert var is not None, '{} is not initialized.'.format(name) AssertionError: tokenizer is not initialized.

this is my finetune script: LOG_ARGS="--log_interval 1 --save_interval 100 --eval_interval 50" TRAIN_ARGS="--train_iters 100 --lr_decay_style cosine --lr_warmup_iters 50 --lr 3e-4 --min_lr 1e-6" DISTRIBUTED_ARGS="--nproc_per_node 8 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 8000" COMMON_ARGS="--num_layers 32 --num_attention_heads 32 --seq_length 4096 --max_position_embeddings 4096 --ffn_hidden_size 11008 --hidden_dropout 0.0 --position_embedding_type rotary --no_bias_gelu_fusion --no_bias_dropout_fusion --use_checkpoint_args --attention_dropout 0.0 --adam_beta1 0.9 --adam_beta2 0.95 --adam_eps 1e-5 --layernorm_epsilon 1e-6 --weight_decay 0.1 --sequence_parallel --recompute_granularity selective --log_timers_to_tensorboard --rope_scaling_factor 1.0"

torchrun $DISTRIBUTED_ARGS finetune.py \ --tensor_model_parallel_size 2 \ --pipeline_model_parallel_size 1 \ --load /Megatron-LLM-sharded-weights \ --save /Megatron-LLM-sharded-weights \ --tensorboard_dir /Megatron-LLM-sharded-weights/tensorboard/ \ --data_path /Megatron-LLM/corpus_indexed/china_text_document \ --split 100,0,0 \ --model_name llama2 \ --tokenizer_type SentencePieceTokenizer \ --make_vocab_size_divisible_by 1 \ --bf16 \ --global_batch_size 1000 \ --micro_batch_size 2 \ --use_checkpoint_args \ $COMMON_ARGS $LOG_ARGS $TRAIN_ARGS

kylematoba commented 9 months ago

I don't see a --vocab_file argument? Please check your call against the example at https://epfllm.github.io/Megatron-LLM/guide/instruction_tuning.html#training.

13416157913 commented 9 months ago

I don't see a --vocab_file argument? Please check your call against the example at https://epfllm.github.io/Megatron-LLM/guide/instruction_tuning.html#training.