PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.66k stars 5.44k forks source link

Fix paddle.mode and paddle.bincount API #63970

Closed xingmingyyj closed 1 week ago

xingmingyyj commented 2 weeks ago

PR Category

Others

PR Types

Bug fixes

Description

paddle.mode和paddle.bincount两个API在静态图模式下组网执行时,出现精度问题。经过分析原因和 #62801 所遇到的问题一致,根据kernel中的数据类型进行修复。

paddle-bot[bot] commented 2 weeks ago

你的PR提交成功,感谢你对开源项目的贡献! 请关注后续CI自动化测试结果,详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.

xingmingyyj commented 2 weeks ago

补充说明bincount报错信息: 下面动转静代码执行时:

......
paddle.seed(33)
obj = naive_func
dy_out = obj(in_tensor, in_params, func)

paddle.seed(33)
jit_obj = paddle.jit.to_static(obj)
st_out = jit_obj(in_tensor, in_params, func)
print("dy_out is: ", dy_out)
print("st_out is: ", st_out)

paddle.jit.save(jit_obj, path="bincount")
print("jit.save is successfully !!!")

paddle.seed(33)
jit = paddle.jit.load("bincount")
print("jit.load is successfully !!!")

paddle.seed(33)
inputs_key = sorted(in_tensor.keys())
inputs_value = []
for k in inputs_key:
    inputs_value.append(in_tensor[k])
# print('inputs_value is: ', inputs_value)
res = jit(*inputs_value)
print('jit.load res: ', res)

compare(dy_out, res, delta=1e-5, rtol=1e-6)

报错如下:

Traceback (most recent call last):
  File "/home/aistudio/fix_op/Paddle/tools/fix_bitcount.py", line 106, in <module>
    res = jit(*inputs_value)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/jit/translated_layer.py", line 1475, in __i_m_p_l__
    return _run_dygraph(self, input, program_holder)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/jit/translated_layer.py", line 1002, in _run_dygraph
    _legacy_C_ops.run_program(
ValueError: In user code:

    InvalidArgumentError: The type of data we are trying to retrieve (int32) does not match the type of data (int64) currently contained in the container.
      [Hint: Expected dtype() == phi::CppTypeToDataType<T>::Type(), but received dtype():9 != phi::CppTypeToDataType<T>::Type():7.] (at /home/aistudio/fix_op/Paddle/paddle/phi/core/dense_tensor.cc:161)
      [operator < pd_kernel.phi_kernel > error]  [operator < run_program > error]

这里可以发现在scale这算子中,张量的实际数据类型和目前期望的数据类型不一致。 执行器执行的计算图如下:

{
    (%0) = "data(phi_kernel)" () {dtype:(pd_op.DataType)bool,is_persistable:[false],kernel_key:<backend:GPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"data",name:"_jst.0.a.0",op_name:"pd_op.data",place:(pd_op.Place)Place(gpu:0),shape:(pd_op.IntArray)[],stop_gradient:[false]} : () -> gpu_tensor<10xi32>
    (%1) = "full(phi_kernel)" () {dtype:(pd_op.DataType)int32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)0} : () -> cpu_tensor<1xi32>
    (%2) = "bincount(phi_kernel)" (%0, <<NULL VALUE>>, %1) {is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:int32>,kernel_name:"bincount",op_name:"pd_op.bincount",stop_gradient:[false]} : (gpu_tensor<10xi32>, <<NULL TYPE>>, cpu_tensor<1xi32>) -> gpu_tensor<-1xi32>
    (%3) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%4) = "scale(phi_kernel)" (%2, %3) {bias:(Float)0,bias_after_scale:true,is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:int32>,kernel_name:"scale",op_name:"pd_op.scale",stop_gradient:[false]} : (gpu_tensor<-1xi32>, cpu_tensor<1xf32>) -> gpu_tensor<-1xi32>
    () = "builtin.shadow_output" (%4) {output_name:"translated_layer/scale_0.tmp_0"} : (gpu_tensor<-1xi32>) -> 
}

猜测时infermeta中的dtype设置问题导致的。这里weight为空,x.dtype为int32,所以被设置为了int32类型,和kernel中的下述逻辑不符。

  if (!has_weights) {
    int64_t* output_data = dev_ctx.template Alloc<int64_t>(output);
    phi::funcs::SetConstant<Context, int64_t>()(
        dev_ctx, output, static_cast<int64_t>(0));

    KernelBincount<T, InputT, int64_t>
        <<<GET_BLOCKS(input_numel), PADDLE_CUDA_NUM_THREADS, 0, stream>>>(
            input_data, input_numel, has_weights, weights_data, output_data);
  }