PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.17k stars 5.56k forks source link

Conv2D/ConvTranspose2d lacks checking of parameter values for exceptions #64839

Open PhyllisJi opened 4 months ago

PhyllisJi commented 4 months ago

bug描述 Describe the Bug

import paddle
import paddle.nn as nn

# 设置 Conv2DTranspose
conv_transpose = paddle.nn.Conv2D(in_channels=8, out_channels=8, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[8, 1], groups=8, bias_attr=None)

input_tensor = paddle.randn([1, 8, 14, 14])  # (batch_size, in_channels, height, width)
output = conv_transpose(input_tensor)
print(output.shape)

Unclear error message

RuntimeError: (PreconditionNotMet) The meta data must be valid when call the mutable data function.
  [Hint: Expected valid() == true, but received valid():0 != true:1.] (at /Users/paddle/xly/workspace/77aceb8e-2a5b-4fe4-a675-547f3aad14a4/Paddle/paddle/phi/core/dense_tensor.cc:127)

Abnormal parameter combinations should be caught in higher level code and explicit hints should be given, e.g. that a dimension will be 0. In some cases, it can also lead to serious crashes:

import paddle

x = paddle.randn([1, 8, 14, 14])
p = paddle.nn.Conv2DTranspose(in_channels=8, out_channels=8, kernel_size=[3, 3], stride=[1, 1], padding=[0, 8], output_padding=[0, 0], dilation=[1, 1], groups=8, bias_attr=None)
print(p(x).shape)
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::eager_api_depthwise_conv2d_transpose(_object*, _object*, _object*)
1   depthwise_conv2d_transpose_ad_func(paddle::Tensor const&, paddle::Tensor const&, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, paddle::experimental::IntArrayBase<paddle::Tensor>, std::string, int, std::vector<int, std::allocator<int> >, std::string)
2   paddle::experimental::depthwise_conv2d_transpose(paddle::Tensor const&, paddle::Tensor const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, paddle::experimental::IntArrayBase<paddle::Tensor> const&, std::string const&, int, std::vector<int, std::allocator<int> > const&, std::string const&)
3   void phi::DepthwiseConv2dTransposeKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, paddle::experimental::IntArrayBase<phi::DenseTensor> const&, std::string const&, int, std::vector<int, std::allocator<int> > const&, std::string const&, phi::DenseTensor*)

----------------------
Error Message Summary:
----------------------
FatalError: `Erroneous arithmetic operation` is detected by the operating system.
  [TimeInfo: *** Aborted at 1717396982 (unix time) try "date -d @1717396982" if you are using GNU date ***]
  [SignalInfo: *** SIGFPE (@0x7f53fa52208d) received by PID 2818 (TID 0x7f54897c14c0) from PID 18446744073614270605 ***]

其他补充信息 Additional Supplementary Information

No response

JZ-LIANG commented 4 months ago

the bug is reproduced. the reason is that the argument "dilation" is not treated properly. it will be update to have a more pre-condition checking.

JZ-LIANG commented 4 months ago

Thanks for bug report