PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.13k stars 5.55k forks source link

【论文复现赛】 #42059

Open simuler opened 2 years ago

simuler commented 2 years ago

bug描述 Describe the Bug

2022-04-21 10:23:52,958 - INFO - **common.configs** 2022-04-21 10:23:52,958 - INFO - use_gpu: True, use_xpu: False, use_npu: False, use_visual: False, train_batch_size: 1, train_data_dir: data/train, epochs: 3, print_interval: 2, model_save_path: output_model_esmm 2022-04-21 10:23:52,958 - INFO - **common.configs** W0421 10:23:52.960151 8552 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1 W0421 10:23:52.965579 8552 device_context.cc:465] device: 0, cuDNN Version: 7.6. 2022-04-21 10:23:56,141 - INFO - read data 2022-04-21 10:23:56,141 - INFO - reader path:reader 0%| | 3/25000 [00:00<27:10, 15.33it/s] 0%| | 0/25000 [00:00<?, ?it/s] 0%| | 3/25000 [00:00<20:57, 19.89it/s]Traceback (most recent call last): File "../../../tools/trainer.py", line 223, in main(args) File "../../../tools/trainer.py", line 145, in main dy_model, metric_list, batch, config) File "/home/aistudio/PaddleRec-master/models/multitask/Meta/dygraph_model.py", line 90, in train_forward loss = dy_model.global_update(sparse_tensor) File "/home/aistudio/PaddleRec-master/models/multitask/Meta/metamodel.py", line 256, in global_update loss_sup = self.local_update(support_set_xs[i], support_set_ys[i]) File "/home/aistudio/PaddleRec-master/models/multitask/Meta/metamodel.py", line 230, in local_update grad = paddle.grad(loss, fast_parameters, create_graph=False, allow_unused=True) File "", line 2, in grad File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl return wrapped_func(*args, *kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 229, in impl return func(args, **kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 628, in grad retain_graph, allow_unused, only_inputs) RuntimeError: (PermissionDenied) Reference count must be 0 when ready var auto411@GRAD is set [Hint: Expected ready_var.cur_ref_cnt == 0, but received ready_var.cur_ref_cnt:1 != 0:0.] (at /paddle/paddle/fluid/imperative/partial_grad_engine.cc:500)

其他补充信息 Additional Supplementary Information

在论文复现过程中,将某个超参数调整至2 模型可以正常训练,将该参数调整至3,会报如上错误,该参数影响Layerlist的层数。由于代码量过大,可否单独私信给您 aistudio 可复现报错信息的项目?

paddle-bot-old[bot] commented 2 years ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

huangjun12 commented 2 years ago

可以提供一下可复现的最小代码

simuler commented 2 years ago

https://aistudio.baidu.com/studio/project/partial/verify/3823824/57e0b402a7514d7aa4c8e271499d7525 cd到指定目录,可直接运行,当num_output参数设置为2时,程序正常运行,num_output参数超过2时,代码在调用grad函数时报错。打印了网络结构,没发现问题