PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.24k stars 5.58k forks source link

VisualDL 动态模型输出报错 #45414

Closed kevinluoc closed 1 year ago

kevinluoc commented 2 years ago

bug描述 Describe the Bug

我们想在visualdl中输出模型结构,但一直报错,直接使用官方提供的示例也一样出错:"Failed to save model graph, error: ''"。

以下代码中有写下相关注释,即时尝试一个线性回归的简单模型,也一样出错

import numpy as np import paddle import paddle.nn as nn import paddle.optimizer as opt from paddle.static import InputSpec from visualdl import LogWriter

BATCH_SIZE = 16 BATCH_NUM = 4 EPOCH_NUM = 4 IMAGE_SIZE = 784 CLASS_NUM = 10

writer = LogWriter(logdir="./log/graph_test02/")

define a random dataset

class RandomDataset(paddle.io.Dataset): def init(self, num_samples): self.num_samples = num_samples

def __getitem__(self, idx):
    image = np.random.random([IMAGE_SIZE]).astype('float32')
    label = np.random.randint(0, CLASS_NUM - 1, (1, )).astype('int64')
    return image, label

def __len__(self):
    return self.num_samples 

class LinearNet(nn.Layer): def init(self): super(LinearNet, self).init() self._linear = nn.Linear(IMAGE_SIZE, CLASS_NUM)

def forward(self, x):
    return self._linear(x)

def train(layer, loader, loss_fn, opt): for epoch_id in range(EPOCH_NUM): for batch_id, (image, label) in enumerate(loader()): out = layer(image) loss = loss_fn(out, label) loss.backward() opt.step() opt.clear_grad() print("Epoch {} batch {}: loss = {}".format(epoch_id, batch_id, np.mean(loss.numpy())))

create network

layer = LinearNet() loss_fn = nn.CrossEntropyLoss() adam = opt.Adam(learning_rate=0.001, parameters=layer.parameters())

create data loader

dataset = RandomDataset(BATCH_NUM * BATCH_SIZE) loader = paddle.io.DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last=True, num_workers=2)

train

train(layer, loader, loss_fn, adam)

save

path = "example.dy_model/linear" paddle.jit.save( layer=layer, path=path, input_spec=[InputSpec(shape=[None, 784], dtype='float32')])

writer.add_graph(model=layer, input_spec=[paddle.static.InputSpec(shape=[None, 784], dtype='float32')], verbose=False) writer.close()

报错同样发生在这里

_writer.add_graph(model=layer, input_spec=[paddle.static.InputSpec(shape=[None, 784], dtype='float32')], verbose=False)_

报错时,运行中的代码如下: import paddle from paddle import Tensor import paddle.fluid as fluid from typing import * import numpy as np def forward(self, x): return paddle.jit.dy2static.convert_call(self._linear)(x)

此內容为paddle运行过程中产生,此外报错并非发生在训练过程中,而是单纯发生在add_graph的方法上。

其他补充信息 Additional Supplementary Information

No response

paddle-bot[bot] commented 2 years ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

zhiboniu commented 2 years ago

你好,visualdl可视化的模型应该是动态模型,应该是你这里paddle.jit.save导致的问题。 可以参考官方使用示例: https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/components/README_CN.md#Graph--%E7%BD%91%E7%BB%9C%E7%BB%93%E6%9E%84%E7%BB%84%E4%BB%B6

kevinluoc commented 2 years ago

你好,visualdl可视化的模型应该是动态模型,应该是你这里paddle.jit.save导致的问题。 可以参考官方使用示例: https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/components/README_CN.md#Graph--%E7%BD%91%E7%BB%9C%E7%BB%93%E6%9E%84%E7%BB%84%E4%BB%B6

您好,感谢回复,但“paddle.jit.save”我们代码中是没有写的,我们只是写了这么一句“writer.add_graph(model=layer, input_spec=[paddle.static.InputSpec(shape=[None, 784], dtype='float32')], verbose=False)”, 我们是通过debug进去看到里面有调用到"paddle.jit.save",所以是否本身Paddle上这个add_graph函数有问题,谢谢!

zhiboniu commented 2 years ago

不是的,这段报错本身就是静态图模型才会出现的现象。我如果没有看错的话save下面这段代码是保存静态图的 paddle.jit.save( layer=layer, path=path, input_spec=[InputSpec(shape=[None, 784], dtype='float32')])

image

不管怎样,先确认add_graph传的layer模型是动态图模型吧

kevinluoc commented 2 years ago

不是的,这段报错本身就是静态图模型才会出现的现象。我如果没有看错的话save下面这段代码是保存静态图的 paddle.jit.save( layer=layer, path=path, input_spec=[InputSpec(shape=[None, 784], dtype='float32')])

image

不管怎样,先确认add_graph传的layer模型是动态图模型吧

感谢您的快速回复,抱歉之前描述的地方让你误解了,我们实际直接使用官方Sample代码(里面是没有paddle.jit.save)的,也是在add_graph中报错,而我们为了追踪为什么报错,所以我们Debug到了add_graph内部,然后paddle框架自己生成出来了这个paddle.jit.save这样代码,然后有报错,具体请参考我这个补充的附件,谢谢! 20220829 Paddle visualdl 動態模型 sample error.docx

kevinluoc commented 2 years ago

@zhiboniu 大牛,不知道这个问题,我有没有描述清楚,你那边有没有什么更新,感谢!

rainyfly commented 2 years ago

你好,已经收到你的反馈,我这边跑你贴的doc里的代码是可以正常执行成功的诶,要不你反馈一下你的环境信息,比如paddle的版本和VisualDL的版本。

kevinluoc commented 2 years ago

你好,已经收到你的反馈,我这边跑你贴的doc里的代码是可以正常执行成功的诶,要不你反馈一下你的环境信息,比如paddle的版本和VisualDL的版本。

抱歉迟了回复,我们的环境如下: paddlefsl 1.1.0 paddlenlp 2.4.0 paddlepaddle 2.3.0 vc 14.2 visualdl 2.3.0 vs2015_runtime 14.27.29016

conda lists:

packages in environment at C:\py39:

#

Name Version Build Channel

aiohttp 3.8.1 pypi_0 pypi aiosignal 1.2.0 pypi_0 pypi astor 0.8.1 py39haa95532_0
async-timeout 4.0.2 pypi_0 pypi attrs 22.1.0 pypi_0 pypi babel 2.10.3 pypi_0 pypi bce-python-sdk 0.8.73 pypi_0 pypi blas 1.0 mkl
brotli 1.0.9 ha925a31_2
brotlipy 0.7.0 py39h2bbff1b_1003
ca-certificates 2022.4.26 haa95532_0
certifi 2022.5.18.1 py39haa95532_0
cffi 1.15.0 py39h2bbff1b_1
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.3 pypi_0 pypi colorama 0.4.5 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi colorlog 6.6.0 pypi_0 pypi cryptography 37.0.1 py39h21b164f_0
cycler 0.11.0 pyhd3eb1b0_0
datasets 2.4.0 pypi_0 pypi decorator 5.1.1 pyhd3eb1b0_0
dill 0.3.4 pypi_0 pypi filelock 3.8.0 pypi_0 pypi flask 2.1.3 pypi_0 pypi flask-babel 2.0.0 pypi_0 pypi flatbuffers 2.0.7 pypi_0 pypi fonttools 4.25.0 pyhd3eb1b0_0
freetype 2.10.4 hd328e21_0
frozenlist 1.3.1 pypi_0 pypi fsspec 2022.7.1 pypi_0 pypi future 0.18.2 pypi_0 pypi gast 0.3.3 py_0
huggingface-hub 0.9.1 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi icu 58.2 ha925a31_3
idna 3.3 pyhd3eb1b0_0
importlib-metadata 4.12.0 pypi_0 pypi intel-openmp 2021.4.0 haa95532_3556
itsdangerous 2.1.2 pypi_0 pypi jieba 0.42.1 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi joblib 1.1.0 pypi_0 pypi jpeg 9e h2bbff1b_0
kiwisolver 1.4.2 py39hd77b12b_0
libpng 1.6.37 h2a8f88b_0
libprotobuf 3.20.1 h23ce68f_0
libtiff 4.2.0 he0120a3_1
libwebp 1.2.2 h2bbff1b_0
lz4-c 1.9.3 h2bbff1b_1
markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.1 py39haa95532_1
matplotlib-base 3.5.1 py39hd77b12b_1
mkl 2021.4.0 haa95532_640
mkl-service 2.4.0 py39h2bbff1b_0
mkl_fft 1.3.1 py39h277e83a_0
mkl_random 1.2.2 py39hf11a4ad_0
mpmath 1.2.1 pypi_0 pypi multidict 6.0.2 pypi_0 pypi multiprocess 0.70.12.2 pypi_0 pypi munkres 1.1.4 py_0
numpy 1.22.3 py39h7a0a035_0
numpy-base 1.22.3 py39hca35cd5_0
onnx 1.12.0 pypi_0 pypi onnxruntime 1.12.1 pypi_0 pypi openssl 1.1.1o h2bbff1b_0
opt_einsum 3.3.0 pyhd3eb1b0_1
packaging 21.3 pyhd3eb1b0_0
paddle-bfloat 0.1.2 pypi_0 pypi paddle2onnx 0.9.8 pypi_0 pypi paddlefsl 1.1.0 pypi_0 pypi paddlenlp 2.4.0 pypi_0 pypi paddlepaddle 2.3.0 py39_cpu_windows https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle pandas 1.4.3 pypi_0 pypi pillow 9.0.1 py39hdc2b20a_0
pip 21.2.4 py39haa95532_0
protobuf 3.20.0 pypi_0 pypi pyarrow 9.0.0 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0
pycryptodome 3.15.0 pypi_0 pypi pyopenssl 22.0.0 pyhd3eb1b0_0
pyparsing 3.0.4 pyhd3eb1b0_0
pyqt 5.9.2 py39hd77b12b_6
pyreadline3 3.4.1 pypi_0 pypi pysocks 1.7.1 py39haa95532_0
python 3.9.12 h6244533_0
python-dateutil 2.8.2 pyhd3eb1b0_0
pytz 2022.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi qt 5.9.7 vc14h73c81de_0
requests 2.27.1 pyhd3eb1b0_0
responses 0.18.0 pypi_0 pypi scikit-learn 1.1.2 pypi_0 pypi scipy 1.9.1 pypi_0 pypi sentencepiece 0.1.97 pypi_0 pypi seqeval 1.2.2 pypi_0 pypi setuptools 61.2.0 py39haa95532_0
sip 4.19.13 py39hd77b12b_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.38.3 h2bbff1b_0
sympy 1.11.1 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi tk 8.6.11 h2bbff1b_1
tornado 6.1 py39h2bbff1b_0
tqdm 4.64.0 pypi_0 pypi typing-extensions 4.3.0 pypi_0 pypi tzdata 2022a hda174b7_0
urllib3 1.26.9 py39haa95532_0
vc 14.2 h21ff451_1
visualdl 2.3.0 pypi_0 pypi vs2015_runtime 14.27.29016 h5e58377_2
werkzeug 2.1.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
win_inet_pton 1.1.0 py39haa95532_0
wincertstore 0.2 py39haa95532_2
xxhash 3.0.0 pypi_0 pypi xz 5.2.5 h8cc25b3_1
yarl 1.8.1 pypi_0 pypi zipp 3.8.1 pypi_0 pypi zlib 1.2.12 h8cc25b3_2
zstd 1.5.2 h19a0ad4_0

rainyfly commented 2 years ago

LogWriter的add_graph方法确实会调用paddle.jit.save方法进行动转静,可以看看在你的环境下,自己直接调用paddle.jit.save方法进行动转静是否会报错,如果不会报错的话,理论上调用add_graph方法也不会报错。

paddle-bot[bot] commented 1 year ago

Since you haven\'t replied for more than a year, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. 由于您超过一年未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。