PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.63k stars 5.44k forks source link

对于Custom Device如何注册fake_quantize_range_abs_max #63888

Closed engineer1109 closed 1 week ago

engineer1109 commented 1 week ago

请提出你的问题 Please ask your question

fake_quantize_range_abs_max 这个kernel使用PD_REGISTER_STRUCT_KERNEL进行注册,没法使用CustomDevice,这个该怎么注册?

engineer1109 commented 1 week ago

@ronny1996

engineer1109 commented 1 week ago

貌似这样也可以 PD_REGISTER_STRUCT_KERNEL( fake_quantize_range_abs_max, Custom, ALL_LAYOUT, xdx::FakeQuantizeRangeAbsMaxKernel, float, phi::dtype::float16) {}

engineer1109 commented 1 week ago

只能直接写Custom

engineer1109 commented 1 week ago

这个还挺麻烦,还需 #undef PADDLE_WITH_CUSTOM_KERNEL 才能使用一些头文件

engineer1109 commented 1 week ago

大概这样的代码

#include "xdx_common.h"

#undef PADDLE_WITH_CUSTOM_KERNEL
#include "paddle/phi/core/dense_tensor.h"

#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h"
#include "paddle/phi/extension.h"
#include "paddle/phi/kernels/cpu/conv_util.h"

#include "xdxdnn.h"

#include "utils/xdxdnn_global_settings.h"

#include "PaddleUtilsAPI.h"

XDX_BEGIN_NAMESPACE

template <typename DeviceContext, typename T>
struct ClipAndFakeQuantFunctor {
    void operator()(const DeviceContext &ctx,
                    const phi::DenseTensor &in,
                    const phi::DenseTensor &scale,
                    const int bin_cnt,
                    const int round_type,
                    phi::DenseTensor *out);
};

template <typename T>
struct ClipAndFakeQuantFunctor<phi::CustomContext, T> {
    void operator()(const phi::CustomContext &ctx,
                    const phi::DenseTensor &in,
                    const phi::DenseTensor &scale,
                    const int bin_cnt,
                    const int round_type,
                    phi::DenseTensor *out) {
        int num = in.numel();
        int block = 1024;
        int grid = (block - 1 + num) / block;

        const T *in_data = in.data<T>();
        const T *scale_data = scale.data<T>();
        T *out_data = out->mutable_data<T>(ctx.GetPlace());

        typedef typename xdx::TypeConverter<T>::type CalcType;
        xdxdnn::ClipAndQuant((cl_command_queue)ctx.stream(),
                             (CalcType *)in_data,
                             (CalcType *)scale_data,
                             (CalcType *)out_data,
                             bin_cnt,
                             round_type,
                             num);

        // ClipAndQuantKernel<T><<<grid, block, 0, ctx.stream()>>>(
        //     in_data, scale_data, bin_cnt, round_type, num, out_data);
    }
};

template <typename T, typename DeviceContext>
class FakeQuantizeRangeAbsMaxKernel : public paddle::framework::OpKernel<T> {
public:
    void Compute(const paddle::framework::ExecutionContext &context) const override {
        auto place = context.GetPlace();
        auto deviceType = place.GetDeviceType();
        auto *in = context.Input<phi::DenseTensor>("X");
        auto *in_scale = context.Input<phi::DenseTensor>("InScale");

        auto *out = context.Output<phi::DenseTensor>("Out");
        out->mutable_data<T>(context.GetPlace());

        bool is_test = context.Attr<bool>("is_test");
        int bit_length = context.Attr<int>("bit_length");
        int round_type = context.Attr<int>("round_type");
        int bin_cnt = std::pow(2, bit_length - 1) - 1;
        auto &dev_ctx = context.template device_context<DeviceContext>();

        // testing
        if (is_test) {
            xdx::ClipAndFakeQuantFunctor<DeviceContext, T>()(dev_ctx, *in, *in_scale, bin_cnt, round_type, out);
            return;
        }

        // // training
        // auto *out_scale = context.Output<phi::DenseTensor>("OutScale");
        // auto *out_scales = context.Output<phi::DenseTensor>("OutScales");
        // auto *iter = context.Input<phi::DenseTensor>("Iter");

        // int window_size = context.Attr<int>("window_size");
        // out_scale->mutable_data<T>(context.GetPlace());

        // phi::DenseTensor cur_scale;
        // T *cur_scale_data = cur_scale.mutable_data<T>({1}, context.GetPlace());
        // FindAbsMaxFunctor<DeviceContext, T>()(dev_ctx, in->data<T>(), in->numel(), cur_scale_data);
        // FindRangeAbsMaxFunctor<DeviceContext, T>()(
        //     dev_ctx, cur_scale, *in_scale, *iter, window_size, out_scales, out_scale);
        // ClipAndFakeQuantFunctor<DeviceContext, T>()(dev_ctx, *in, *out_scale, bin_cnt, round_type, out);
    }
};

XDX_END_NAMESPACE

PD_REGISTER_STRUCT_KERNEL(
    fake_quantize_range_abs_max, Custom, ALL_LAYOUT, xdx::FakeQuantizeRangeAbsMaxKernel, float, phi::dtype::float16) {}

但这不是通过 PD_REGISTER_PLUGIN_KERNEL 实现的 并且需要 #undef PADDLE_WITH_CUSTOM_KERNEL 才能使用一些头文件

qili93 commented 1 week ago

看了下 fake_quantize_range_abs_max 这个算子可以参考 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/kernels/custom/c_embedding_kernel.cc 这个算子的注册方式。

也是把 c_embedding 注册为custom device设备的算子,注册接口是 PD_REGISTER_KERNEL,也不需要 #undef PADDLE_WITH_CUSTOM_KERNEL 也能引用 fluid 头文件。但是这个算子注册代码需要放到Paddle主框架内,不能放到Custom Device的插件包内,否则会无法引用fluid头文件。

engineer1109 commented 1 week ago

@qili93 PD_REGISTER_STRUCT_KERNEL 你们用这些注册的算子 ,好像都没有输入输出,会报enforce错误,input_args == 0 与input_defs = 1 之类的错误

engineer1109 commented 1 week ago

希望 以后 能给 fake_quantize_range_abs_max这类算子 补一个 ops.yaml 的生成方法。

qili93 commented 1 week ago

您好,我看了下PR历史,这个 PD_REGISTER_STRUCT_KERNEL 是为了兼容paddle老的fluid和新的phi算子。可以参考此PR的修改描述 https://github.com/PaddlePaddle/Paddle/pull/49328

这个问题和Paddle算子库从fluid迁移到phi的历史原因相关,具体是因为 fake_quantize_range_abs_max 是一个老的fluid定义的OP,因此它没有类似PHI Kernel定义的输入输出参数,它的参数是遵循fluid算子定义的方式,OP定义的代码在这里能找到 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/fake_quantize_op.cc#L576 。老的fluid算子会在 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/phi/api/yaml/op_compat.yaml#L1128 这个文件中存在。

非常感谢您的建议,我会反馈您的需求给基础框架的算子团队同学,看下是否有迁移目前的老的Fluid算子到PHI算子的计划,来避免此类问题,谢谢!