IEIT-Yuan / Yuan-2.0

Yuan 2.0 Large Language Model
Other
677 stars 85 forks source link

TypeError: 'int' object does not support item assignment #49

Closed Science2AI-TaoXu closed 9 months ago

Science2AI-TaoXu commented 9 months ago

this is the way i set run_inference_server_2.1B.sh parameter: TOKENIZER_MODEL_PATH=/workspace/checkpoints/Yuan2.0-2B/2B/latest_checkpointed_iteration.txt CHECKPOINT_PATH=/workspace/checkpoints/Yuan2.0-2B/2B/mp_rank_00

the error : [2023-12-07 02:20:18,091] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 WARNING: overriding default arguments for tokenizer_type:YuanTokenizer with tokenizer_type:YuanTokenizer setting global batch size to 1 accumulate and all-reduce gradients in fp32 for bfloat16 data type. using torch.bfloat16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. True adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 add_bias_linear ................................. False add_position_embedding .......................... True adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_layernorm_1p .............................. False apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False async_tensor_model_parallel_allreduce ........... False attention_dropout ............................... 0.0 attention_softmax_in_fp32 ....................... False barrier_with_L1_time ............................ True beam_width ...................................... None bert_binary_head ................................ True bert_embedder_type .............................. megatron bert_load ....................................... None bf16 ............................................ True bias_dropout_fusion ............................. True bias_gelu_fusion ................................ False biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None classes_fraction ................................ 1.0 clip_grad ....................................... 1.0 consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 data_cache_path ................................. None data_impl ....................................... infer data_parallel_random_init ....................... False data_parallel_size .............................. 1 data_path ....................................... None data_per_class_fraction ......................... 1.0 data_sharding ................................... True dataloader_type ................................. single DDP_impl ........................................ local decoder_num_layers .............................. None decoder_seq_length .............................. None dino_bottleneck_size ............................ 256 dino_freeze_last_layer .......................... 1 dino_head_hidden_size ........................... 2048 dino_local_crops_number ......................... 10 dino_local_img_size ............................. 96 dino_norm_last_layer ............................ False dino_teacher_temp ............................... 0.07 dino_warmup_teacher_temp ........................ 0.04 dino_warmup_teacher_temp_epochs ................. 30 distribute_saved_activations .................... False distributed_backend ............................. nccl distributed_timeout_minutes ..................... 10 embedding_path .................................. None embedding_weights_in_fp32 ....................... False empty_unused_memory_level ....................... 0 encoder_num_layers .............................. 24 encoder_seq_length .............................. 8192 end_weight_decay ................................ 0.01 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... None exit_interval ................................... None exit_on_missing_checkpoint ...................... False exit_signal_handler ............................. False ffn_hidden_size ................................. 8192 fim_rate ........................................ 0.5 fim_spm_rate .................................... 0.5 finetune ........................................ False flash_attn_drop ................................. 0.0 fp16 ............................................ False fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False fp8_amax_compute_algo ........................... most_recent fp8_amax_history_len ............................ 1 fp8_e4m3 ........................................ False fp8_hybrid ...................................... False fp8_interval .................................... 1 fp8_margin ...................................... 0 fp8_wgrad ....................................... True global_batch_size ............................... 1 gradient_accumulation_fusion .................... True head_lr_mult .................................... 1.0 hidden_dropout .................................. 0.0 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_h ........................................... 224 img_w ........................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 inference_batch_times_seqlen_threshold .......... 512 inference_server ................................ True init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 iter_per_epoch .................................. 1250 kv_channels ..................................... 64 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None length_penalty .................................. 1 lf_conv2d_group ................................. 1 lf_conv2d_num_pad ............................... 0 load ............................................ /workspace/checkpoints/Yuan2.0-2B/2B/mp_rank_00 local_rank ...................................... None log_batch_size_to_tensorboard ................... False log_interval .................................... 100 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_memory_to_tensorboard ....................... False log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... False log_validation_ppl_to_tensorboard ............... False log_world_size_to_tensorboard ................... False loss_scale ...................................... None loss_scale_window ............................... 1000 lr .............................................. None lr_2nd_period_scaler ............................ 1 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. linear lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 0 make_vocab_size_divisible_by .................... 128 mask_factor ..................................... 1.0 mask_prob ....................................... 0.15 mask_type ....................................... random masked_softmax_fusion ........................... True max_position_embeddings ......................... 8192 max_tokens_to_oom ............................... 12000 memorybuffer_device ............................. None merge_file ...................................... None micro_batch_size ................................ 1 min_length ...................................... 0 min_loss_scale .................................. 1.0 min_lr .......................................... 0.0 mmap_warmup ..................................... False no_embedding_dropout ............................ False no_load_args .................................... None no_load_optim ................................... True no_load_rng ..................................... True no_persist_layer_norm ........................... False no_save_optim ................................... None no_save_rng ..................................... None norm_dtype ...................................... RMSNorm num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_experts ..................................... None num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam out_seq_length .................................. 1024 output_bert_embeddings .......................... False overlap_p2p_comm ................................ False override_opt_param_scheduler .................... False params_dtype .................................... torch.bfloat16 patch_dim ....................................... 16 perform_initialization .......................... True pipeline_blocks ................................. [24] pipeline_model_parallel_blocks .................. None pipeline_model_parallel_method .................. uniform pipeline_model_parallel_size .................... 1 pipeline_model_parallel_split_rank .............. None position_embedding_type ......................... rope prevent_newline_after_colon ..................... False profile ......................................... False profile_ranks ................................... [0] profile_step_end ................................ 12 profile_step_start .............................. 10 query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None random_seed ..................................... 1234 rank ............................................ 0 recompute_granularity ........................... None recompute_method ................................ None recompute_num_layers ............................ 1 reset_attention_mask ............................ False reset_position_ids .............................. True retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 retro_add_retriever ............................. False retro_cyclic_train_iters ........................ None retro_encoder_attention_dropout ................. 0.1 retro_encoder_hidden_dropout .................... 0.1 retro_encoder_layers ............................ 2 retro_num_neighbors ............................. 2 retro_num_retrieved_chunks ...................... 2 retro_return_doc_ids ............................ False retro_workdir ................................... None rotary_percent .................................. 1.0 sample_rate ..................................... 1.0 save ............................................ None save_interval ................................... None scatter_gather_tensors_in_pipeline .............. True seed ............................................ 7202 seq_length ...................................... 8192 sequence_parallel ............................... False sft_stage ....................................... False sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train ...................................... False split ........................................... 969, 30, 1 squared_relu .................................... False standalone_embedding_stage ...................... False start_weight_decay .............................. 0.01 swiglu .......................................... True swin_backbone_type .............................. tiny temperature ..................................... 1.0 tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. None tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 1000 test_data_path .................................. None timing_log_level ................................ 0 timing_log_option ............................... minmax titles_data_path ................................ None tokenizer_model ................................. None tokenizer_model_path ............................ /workspace/checkpoints/Yuan2.0-2B/2B/latest_checkpointed_iteration.txt tokenizer_type .................................. YuanTokenizer top_k ........................................... 5 top_p ........................................... 0.0 top_p_bound ..................................... 0.0 top_p_decay ..................................... 0.0 train_data_path ................................. None train_iters ..................................... None train_reset ..................................... None train_samples ................................... None transformer_impl ................................ local transformer_pipeline_model_parallel_size ........ 1 untie_embeddings_and_output_weights ............. False use_checkpoint_args ............................. False use_checkpoint_opt_param_scheduler .............. False use_contiguous_buffers_in_local_ddp ............. True use_cpu_initialization .......................... None use_distributed_optimizer ....................... False use_flash_attn .................................. False use_lf_gate ..................................... True use_one_sent_docs ............................... False use_ring_exchange_p2p ........................... False use_rotary_position_embeddings .................. False valid_data_path ................................. None variable_seq_lengths ............................ False virtual_pipeline_model_parallel_size ............ None vision_backbone_type ............................ vit vision_pretraining .............................. False vision_pretraining_type ......................... classify vocab_extra_ids ................................. 0 vocab_file ...................................... None vocab_size ...................................... None weight_decay .................................... 0.01 weight_decay_incr_style ......................... constant world_size ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 1

building YuanTokenizer tokenizer ... /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1946: FutureWarning: Calling LlamaTokenizer.from_pretrained() with the path to a single file or url is deprecated and won't be possible anymore in v5. Use a model identifier or the path to a directory instead. warnings.warn( Traceback (most recent call last): File "/workspace/yuan_2.0/tools/run_text_generation_server.py", line 64, in initialize_megatron(extra_args_provider=add_text_generate_args, File "/workspace/yuan_2.0/megatron/initialize.py", line 50, in initialize_megatron set_global_variables(args) File "/workspace/yuan_2.0/megatron/global_vars.py", line 93, in set_globalvariables = _build_tokenizer(args) File "/workspace/yuan_2.0/megatron/global_vars.py", line 126, in _build_tokenizer _GLOBAL_TOKENIZER = build_tokenizer(args) File "/workspace/yuan_2.0/megatron/tokenizer/tokenizer.py", line 45, in build_tokenizer tokenizer = LlamaTokenizer.from_pretrained(args.tokenizer_model_path, add_eos_token=False, add_bos_token=False, eos_token='') File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2045, in from_pretrained return cls._from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2122, in _from_pretrained config = AutoConfig.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1034, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 620, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 706, in _get_config_dict config_dict["_commit_hash"] = commit_hash TypeError: 'int' object does not support item assignment ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3172) of binary: /usr/bin/python Traceback (most recent call last): File "/usr/local/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==2.1.0a0+4136153', 'console_scripts', 'torchrun')()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in main run(args) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 788, in run elastic_launch( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

tools/run_text_generation_server.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-12-07_02:20:21 host : b939a7329a28 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3172) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
Science2AI-TaoXu commented 9 months ago

运行bash examples/run_inference_server_2.1B.sh 时出错

zhaoxudong01 commented 9 months ago

tokenizer的路径指定的不对,需要指定到 ./tokenizer/ CHECKPOINT_PATH=/workspace/checkpoints/Yuan2.0-2B/2B

Science2AI-TaoXu commented 9 months ago

试过了,没有用,同样的报错

zhaoxudong01 commented 9 months ago

从报错来看的话,确实是tokenizer指定的路径不对,您是否方便提供修改路径后的执行脚本和log日志

image

Science2AI-TaoXu commented 9 months ago

修改后的路径:TOKENIZER_MODEL_PATH=/workspace/checkpoints/Yuan2.0-2B/2B/latest_checkpointed_iteration.txt CHECKPOINT_PATH=/workspace/checkpoints/Yuan2.0-2B/2B/ 我从[OpenXlab]下载的权重,看到和其他(百度网盘)路径还不完全一样,我又改成和其他路径一样试了试,文件如下: /workspace/checkpoints/Yuan2.0-2B ├── 2B │ ├── iter_0000001 │ │ └── mp_rank_00 │ │ └── model_optim_rng.pt │ └── latest_checkpointed_iteration.txt ├── 2B.zip ├── LICENSE └── LICENSE-Yuan

报错的日志如下:

[2023-12-13 08:27:15,460] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect) using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 WARNING: overriding default arguments for tokenizer_type:YuanTokenizer with tokenizer_type:YuanTokenizer setting global batch size to 1 accumulate and all-reduce gradients in fp32 for bfloat16 data type. using torch.bfloat16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. True adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 add_bias_linear ................................. False add_position_embedding .......................... True adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_layernorm_1p .............................. False apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False async_tensor_model_parallel_allreduce ........... False attention_dropout ............................... 0.0 attention_softmax_in_fp32 ....................... False barrier_with_L1_time ............................ True beam_width ...................................... None bert_binary_head ................................ True bert_embedder_type .............................. megatron bert_load ....................................... None bf16 ............................................ True bias_dropout_fusion ............................. True bias_gelu_fusion ................................ False biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None classes_fraction ................................ 1.0 clip_grad ....................................... 1.0 consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 data_cache_path ................................. None data_impl ....................................... infer data_parallel_random_init ....................... False data_parallel_size .............................. 1 data_path ....................................... None data_per_class_fraction ......................... 1.0 data_sharding ................................... True dataloader_type ................................. single DDP_impl ........................................ local decoder_num_layers .............................. None decoder_seq_length .............................. None dino_bottleneck_size ............................ 256 dino_freeze_last_layer .......................... 1 dino_head_hidden_size ........................... 2048 dino_local_crops_number ......................... 10 dino_local_img_size ............................. 96 dino_norm_last_layer ............................ False dino_teacher_temp ............................... 0.07 dino_warmup_teacher_temp ........................ 0.04 dino_warmup_teacher_temp_epochs ................. 30 distribute_saved_activations .................... False distributed_backend ............................. nccl distributed_timeout_minutes ..................... 10 embedding_path .................................. None embedding_weights_in_fp32 ....................... False empty_unused_memory_level ....................... 0 encoder_num_layers .............................. 24 encoder_seq_length .............................. 8192 end_weight_decay ................................ 0.01 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... None exit_interval ................................... None exit_on_missing_checkpoint ...................... False exit_signal_handler ............................. False ffn_hidden_size ................................. 8192 fim_rate ........................................ 0.5 fim_spm_rate .................................... 0.5 finetune ........................................ False flash_attn_drop ................................. 0.0 fp16 ............................................ False fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False fp8_amax_compute_algo ........................... most_recent fp8_amax_history_len ............................ 1 fp8_e4m3 ........................................ False fp8_hybrid ...................................... False fp8_interval .................................... 1 fp8_margin ...................................... 0 fp8_wgrad ....................................... True global_batch_size ............................... 1 gradient_accumulation_fusion .................... True head_lr_mult .................................... 1.0 hidden_dropout .................................. 0.0 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_h ........................................... 224 img_w ........................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 inference_batch_times_seqlen_threshold .......... 512 inference_server ................................ True init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 iter_per_epoch .................................. 1250 kv_channels ..................................... 64 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None length_penalty .................................. 1 lf_conv2d_group ................................. 1 lf_conv2d_num_pad ............................... 0 load ............................................ /workspace/checkpoints/Yuan2.0-2B/2B/ local_rank ...................................... None log_batch_size_to_tensorboard ................... False log_interval .................................... 100 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_memory_to_tensorboard ....................... False log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... False log_validation_ppl_to_tensorboard ............... False log_world_size_to_tensorboard ................... False loss_scale ...................................... None loss_scale_window ............................... 1000 lr .............................................. None lr_2nd_period_scaler ............................ 1 lr_decay_iters .................................. None lr_decay_samples ................................ None lr_decay_style .................................. linear lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 0 make_vocab_size_divisible_by .................... 128 mask_factor ..................................... 1.0 mask_prob ....................................... 0.15 mask_type ....................................... random masked_softmax_fusion ........................... True max_position_embeddings ......................... 8192 max_tokens_to_oom ............................... 12000 memorybuffer_device ............................. None merge_file ...................................... None micro_batch_size ................................ 1 min_length ...................................... 0 min_loss_scale .................................. 1.0 min_lr .......................................... 0.0 mmap_warmup ..................................... False no_embedding_dropout ............................ False no_load_args .................................... None no_load_optim ................................... True no_load_rng ..................................... True no_persist_layer_norm ........................... False no_save_optim ................................... None no_save_rng ..................................... None norm_dtype ...................................... RMSNorm num_attention_heads ............................. 32 num_channels .................................... 3 num_classes ..................................... 1000 num_experts ..................................... None num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam out_seq_length .................................. 1024 output_bert_embeddings .......................... False overlap_p2p_comm ................................ False override_opt_param_scheduler .................... False params_dtype .................................... torch.bfloat16 patch_dim ....................................... 16 perform_initialization .......................... True pipeline_blocks ................................. [24] pipeline_model_parallel_blocks .................. None pipeline_model_parallel_method .................. uniform pipeline_model_parallel_size .................... 1 pipeline_model_parallel_split_rank .............. None position_embedding_type ......................... rope prevent_newline_after_colon ..................... False profile ......................................... False profile_ranks ................................... [0] profile_step_end ................................ 12 profile_step_start .............................. 10 query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None random_seed ..................................... 1234 rank ............................................ 0 recompute_granularity ........................... None recompute_method ................................ None recompute_num_layers ............................ 1 reset_attention_mask ............................ False reset_position_ids .............................. True retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 retro_add_retriever ............................. False retro_cyclic_train_iters ........................ None retro_encoder_attention_dropout ................. 0.1 retro_encoder_hidden_dropout .................... 0.1 retro_encoder_layers ............................ 2 retro_num_neighbors ............................. 2 retro_num_retrieved_chunks ...................... 2 retro_return_doc_ids ............................ False retro_workdir ................................... None rotary_percent .................................. 1.0 sample_rate ..................................... 1.0 save ............................................ None save_interval ................................... None scatter_gather_tensors_in_pipeline .............. True seed ............................................ 12141 seq_length ...................................... 8192 sequence_parallel ............................... False sft_stage ....................................... False sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 skip_train ...................................... False split ........................................... 969, 30, 1 squared_relu .................................... False standalone_embedding_stage ...................... False start_weight_decay .............................. 0.01 swiglu .......................................... True swin_backbone_type .............................. tiny temperature ..................................... 1.0 tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. None tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 1000 test_data_path .................................. None timing_log_level ................................ 0 timing_log_option ............................... minmax titles_data_path ................................ None tokenizer_model ................................. None tokenizer_model_path ............................ /workspace/checkpoints/Yuan2.0-2B/2B/latest_checkpointed_iteration.txt tokenizer_type .................................. YuanTokenizer top_k ........................................... 5 top_p ........................................... 0.0 top_p_bound ..................................... 0.0 top_p_decay ..................................... 0.0 train_data_path ................................. None train_iters ..................................... None train_reset ..................................... None train_samples ................................... None transformer_impl ................................ local transformer_pipeline_model_parallel_size ........ 1 untie_embeddings_and_output_weights ............. False use_checkpoint_args ............................. False use_checkpoint_opt_param_scheduler .............. False use_contiguous_buffers_in_local_ddp ............. True use_cpu_initialization .......................... None use_distributed_optimizer ....................... False use_flash_attn .................................. False use_lf_gate ..................................... True use_one_sent_docs ............................... False use_ring_exchange_p2p ........................... False use_rotary_position_embeddings .................. False valid_data_path ................................. None variable_seq_lengths ............................ False virtual_pipeline_model_parallel_size ............ None vision_backbone_type ............................ vit vision_pretraining .............................. False vision_pretraining_type ......................... classify vocab_extra_ids ................................. 0 vocab_file ...................................... None vocab_size ...................................... None weight_decay .................................... 0.01 weight_decay_incr_style ......................... constant world_size ...................................... 1 -------------------- end of arguments --------------------- setting number of micro-batches to constant 1

building YuanTokenizer tokenizer ... /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1727: FutureWarning: Calling LlamaTokenizer.from_pretrained() with the path to a single file or url is deprecated and won't be possible anymore in v5. Use a model identifier or the path to a directory instead. warnings.warn( Traceback (most recent call last): File "/workspace/yuan_2.0/tools/run_text_generation_server.py", line 64, in initialize_megatron(extra_args_provider=add_text_generate_args, File "/workspace/yuan_2.0/megatron/initialize.py", line 50, in initialize_megatron set_global_variables(args) File "/workspace/yuan_2.0/megatron/global_vars.py", line 93, in set_globalvariables = _build_tokenizer(args) File "/workspace/yuan_2.0/megatron/global_vars.py", line 126, in _build_tokenizer _GLOBAL_TOKENIZER = build_tokenizer(args) File "/workspace/yuan_2.0/megatron/tokenizer/tokenizer.py", line 45, in build_tokenizer tokenizer = LlamaTokenizer.from_pretrained(args.tokenizer_model_path, add_eos_token=False, add_bos_token=False, eos_token='') File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1825, in from_pretrained return cls._from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1900, in _from_pretrained config = AutoConfig.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 944, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 574, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 660, in _get_config_dict config_dict["_commit_hash"] = commit_hash TypeError: 'int' object does not support item assignment ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2194) of binary: /usr/bin/python Traceback (most recent call last): File "/usr/local/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==2.1.0a0+4136153', 'console_scripts', 'torchrun')()) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in main run(args) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 788, in run elastic_launch( File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

tools/run_text_generation_server.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2023-12-13_08:27:20 host : 37c6d04b76e7 rank : 0 (local_rank: 0) exitcode : 1 (pid: 2194) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
zhaoxudong01 commented 9 months ago

TOKENIZER_MODEL_PATH=/workspace/checkpoints/Yuan2.0-2B/2B/latest_checkpointed_iteration.txt 这个要设置为代码路径下的tokenizer目录

Science2AI-TaoXu commented 9 months ago

好的感谢,这个问题解决了