Closed xxrrnn closed 4 months ago
我在执行stage1.sh的时候,出现报错
Traceback (most recent call last): File "graphgpt/train/train_mem.py", line 20, in <module> train() File "/root/autodl-tmp/GraphGPT/graphgpt/train/train_graph.py", line 871, in train model_graph_dict = model.get_model().initialize_graph_modules( File "/root/autodl-tmp/GraphGPT/graphgpt/model/GraphLlama.py", line 139, in initialize_graph_modules clip_graph, args= load_model_pretrained(CLIP, self.config.pretrain_graph_model_path) File "/root/autodl-tmp/GraphGPT/graphgpt/model/GraphLlama.py", line 54, in load_model_pretrained assert osp.exists(osp.join(pretrain_model_path, 'config.json')), 'config.json missing' AssertionError: config.json missing
我参考其他issue,已经修改了vicuna里面的config.json如下:
"_name_or_path": "vicuna-7b-v1.5-16k", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_sequence_length": 16384, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 4.0, "type": "linear" }, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000, "graph_hidden_size": 128, "pretrain_graph_model_path": "/root/autodl-tmp/GraphGPT/Arxiv-PubMed-GraphCLIP-GT/" }
sh文件如下:
model_path=./vicuna-7b-v1.5-16k instruct_ds=./data/graph_matching.json graph_data_path=./graph_data/all_graph_data.pt pretra_gnn=clip_gt_arxiv output_model=./stage_1 wandb offline python3 -m torch.distributed.run --nnodes=1 --nproc_per_node=1 --master_port=20001 \ graphgpt/train/train_mem.py \ --model_name_or_path ${model_path} \ --version v1 \ --data_path ${instruct_ds} \ --graph_content ./arxiv_ti_ab.json \ --graph_data_path ${graph_data_path} \ --graph_tower ${pretra_gnn} \ --tune_graph_mlp_adapter True \ --graph_select_layer -2 \ --use_graph_start_end \ --bf16 True \ --output_dir ${output_model} \ --num_train_epochs 3 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 2400 \ --save_total_limit 1 \ --learning_rate 2e-3 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True \ --report_to wandb```
我通过print中间变量了解了如何解决,但还是希望能标注好,避免这种情况
我在执行stage1.sh的时候,出现报错
我参考其他issue,已经修改了vicuna里面的config.json如下:
sh文件如下: