Open arbitraryking opened 1 year ago
你本地有安装paddle吗,可以先测试下单机版本能不能跑通
排序模型dnn的单机版本我跑通了,我安装的paddlepaddle:2.1.0,paddlepaddle-gpu:2.4.2.post116 slot_dnn的单机版本报错:
(py37) E:\PaddleRec\PaddleRec\models\rank\slot_dnn>python -u ../../../tools/static_trainer.py -m config_queuedataset.yaml
2023-05-15 16:32:44,707 - INFO - cpu_num: None
2023-05-15 16:32:44,708 - INFO - **************common.configs**********
2023-05-15 16:32:44,708 - INFO - use_gpu: False, use_xpu: False, use_visual: False, train_batch_size: 2, train_data_dir: data/, epochs: 3, print_interval: 10, model_save_path: output_model_benchdnn_queue
2023-05-15 16:32:44,708 - INFO - **************common.configs**********
2023-05-15 16:32:45,986 - INFO - File list: ['E:\\PaddleRec\\PaddleRec\\models\\rank\\slot_dnn\\data//demo_10']
train file_list: ['E:\\PaddleRec\\PaddleRec\\models\\rank\\slot_dnn\\data//demo_10']
parse ins id: None
utils_path: E:\PaddleRec\PaddleRec\tools\utils\static_ps
abs_train_reader is: E:\PaddleRec\PaddleRec\models\rank\slot_dnn\criteo_reader
pipe_command is: python3.7 queuedataset_reader.py config_queuedataset.yaml E:\PaddleRec\PaddleRec\tools\utils\static_ps
dataset init thread_num: 1
2023-05-15 16:32:45,989 - INFO - Get Train Dataset
dataset get_reader thread_num: 1
2023-05-15 16:32:45,996 - INFO - AUC Reset To Zero: _generated_var_0
2023-05-15 16:32:45,996 - INFO - AUC Reset To Zero: _generated_var_1
2023-05-15 16:32:45,997 - INFO - AUC Reset To Zero: _generated_var_2
2023-05-15 16:32:45,997 - INFO - AUC Reset To Zero: _generated_var_3
2023-05-15 16:32:45,997 - INFO - AUC Reset To Zero: _generated_var_4
device worker program id: 2348362127944
I0515 16:32:46.040287 4596 hogwild_worker.cc:270] worker 0 train cost 0 seconds, batch_num: 0
2023-05-15 16:32:46,048 - INFO - epoch: 0 done, epoch time: 0.05 s
Traceback (most recent call last):
File "../../../tools/static_trainer.py", line 315, in <module>
main(args)
File "../../../tools/static_trainer.py", line 207, in main
prefix='rec_static')
File "E:\PaddleRec\PaddleRec\tools\utils\save_load.py", line 61, in save_static_model
paddle.static.save(program, model_prefix)
File "D:\Anaconda\envs\py37\lib\site-packages\decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "D:\Anaconda\envs\py37\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 26, in __impl__
return wrapped_func(*args, **kwargs)
File "D:\Anaconda\envs\py37\lib\site-packages\paddle\fluid\framework.py", line 558, in __impl__
return func(*args, **kwargs)
File "D:\Anaconda\envs\py37\lib\site-packages\paddle\fluid\io.py", line 1876, in save
param_dict = {p.name: get_tensor(p) for p in parameter_list}
File "D:\Anaconda\envs\py37\lib\site-packages\paddle\fluid\io.py", line 1876, in <dictcomp>
param_dict = {p.name: get_tensor(p) for p in parameter_list}
File "D:\Anaconda\envs\py37\lib\site-packages\paddle\fluid\io.py", line 1872, in get_tensor
t = global_scope().find_var(var.name).get_tensor()
ValueError: (InvalidArgument) The Variable type must be class phi::DenseTensor, but the type it holds is class phi::SelectedRows.
[Hint: Expected holder_->Type() == VarTypeTrait<T>::kId, but received holder_->Type():8 != VarTypeTrait<T>::kId:7.] (at ..\paddle/fluid/framework/variable.h:58)
我先复现下,确认后会及时修复哈
按照doc/online_trainer.md执行命令
我看了下C:\ProgramData\目录下没有Anaconda3,这个python路径没有看到哪里能配置呢