Open anderson101866 opened 2 months ago
需要使用相应d的配置文件来训练,可以参考llama
了解,所以現在 --model_name_or_path gpt3-1.3B-en
這種寫法是否不支持了呢? <--Q1
看起來以前可以從 paddlenlp/transformers/gpt/configuration.py:118導入, @wawltor Q2: 主要是想請教, 現在如何將上述預定義的GPT3-1.3B配置帶入run_pretrain.py中呢?
目前主要是配置化的方式,上面的方式有可能会导致部分参数缺失;建议按照GPT的示例配置文件来适配
@wawltor 感謝你的回覆,然而不幸的是,看起來並非root-cause 仍然有問題
我照你的步驟建立 gpt3-1.3B-en.json
{
"model_name_or_path": "gpt3-1.3B-en",
"tokenizer_name_or_path": "gpt3-1.3B-en",
"input_dir": "/workspace/dataset",
"output_dir": "output/paddlenlp_gpt3/debug/model_output",
"bf16": true,
"sequence_parallel": true,
"tensor_parallel_degree": 8,
"sharding_parallel_degree": 1,
"sharding": "stage2",
"pipeline_parallel_degree": 1,
"virtual_pp_degree": 1,
"pipeline_parallel_config": "disable_partial_send_recv",
"per_device_train_batch_size": 72,
"per_device_eval_batch_size": 72,
"gradient_accumulation_steps": 32,
"split": "949,50,1",
"max_seq_length": 2048,
"fuse_attention_qkv": true,
"use_flash_attention": true,
"fp16_opt_level": "O2",
"learning_rate": 0.00001,
"min_learning_rate": 0.000005,
"save_steps": 100000,
"weight_decay": 0.01,
"warmup_ratio": 0.01,
"max_grad_norm": 1.0,
"logging_steps": 1,
"dataloader_num_workers": 1,
"hidden_dropout_prob": 0.1,
"attention_probs_dropout_prob": 0.1,
"max_steps": 32,
"eval_steps": 100000,
"report_to": "visualdl",
"disable_tqdm": true,
"do_train": true,
"continue_training": 0,
"device": "gpu"
}
然後運行
python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 --ips 127.0.0.1 --log_dir output/paddle_gpt3/debug llm/run_pretrain.py ./gpt3-1.3B-en.json
Traceback (most recent call last):
File "/home/scratch.ameng_gpu/git/2PaddleNLP_anderson/llm/run_pretrain.py", line 605, in <module>
main()
File "/home/scratch.ameng_gpu/git/2PaddleNLP_anderson/llm/run_pretrain.py", line 511, in main
model = model_class.from_config(config, dtype=dtype)
File "/home/scratch.ameng_gpu/git/2PaddleNLP_anderson/paddlenlp/transformers/auto/modeling.py", line 269, in from_config
model_class = cls._get_model_class_from_config(None, None, config)
File "/home/scratch.ameng_gpu/git/2PaddleNLP_anderson/paddlenlp/transformers/auto/modeling.py", line 218, in _get_model_class_from_config
init_class = architectures.pop() if len(architectures) > 0 else None
TypeError: object of type 'NoneType' has no len()
GP3-1.3B是公開的模型,要不要直接在你那邊覆現看看?
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
软件环境
重复问题
错误描述
使用llm/run_pretrain.py嘗試訓練"GPT3-1.3B",初始化模型階段會發生錯誤:
architecture
不知為何為None
Log (click me)
[2024-08-16 06:13:15,389] [ INFO] - We are using稳定复现步骤 & 代码