conv、conv_transpose and pool support NHWC and asymmetric padding in MKLDNN

luotao1 commented 4 years ago

背景

Paddle之前仅部分OP支持channel_last(NHWC)格式输入。TensorFlow目前CV模型默认均为channel_last格式输入, 但支持NHWC和NCHW两种格式输入。以fluid.layers.conv2d_transpose(input, num_filters, filter_size=None, ..., data_format='NCHW')为例，用户通过设置data_format参数指定输入和输出的格式。
Padding之前涉及到Paddlnng操作的OP，如conv, pool都只支持对称的Padding方式，不支持两侧Padding size不同。

需求

conv_mkldnn_op, conv_transpose_mkldnn_op, pool_mkldnn_op需要支持NHWC和非对称padding。

序号	Paddle接口	升级功能点	使用建议
1	paddle.fluid.layers.pool2d(input, pool_size=-1, pool_type='max', pool_stride=1, `pool_padding=0`, global_pooling=False, use_cudnn=True, ceil_mode=False, name=None, exclusive=True, `data_format="NCHW"`)	(1)`pool_padding`:支持非对称padding和"SAME" "VALID"; (2)新增`data_format` 参数，可支持NHWC格式	使用 `data_format`指定数据格式
2	paddle.fluid.layers.pool3d(input, pool_size=-1, pool_type='max', pool_stride=1, `pool_padding=0`, global_pooling=False, use_cudnn=True, ceil_mode=False, name=None, exclusive=True, `data_format="NCDHW"`)	(1)`pool_padding`:支持非对称padding和"SAME" "VALID"; (2)新增`data_format` 参数，可支持NDHWC格式	使用 `data_format`指定数据格式
3	paddle.fluid.layers.conv2d(input, num_filters, filter_size, stride=1, `padding=0`, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, name=None,`data_format="NCHW"`)	(1)`padding`:支持非对称padding和"SAME" "VALID"; (2)新增`data_format` 参数，可支持NHWC格式	使用 `data_format`指定数据格式
4	paddle.fluid.layers.conv3d(input, num_filters, filter_size, stride=1, `padding=0`, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, name=None,`data_format="NCDHW"`)	(1)`padding`:支持非对称padding和"SAME" "VALID"; (2)新增`data_format` 参数，可支持NDHWC格式	使用 `data_format`指定数据格式
5	paddle.fluid.layers.conv2d_transpose(input, num_filters, filter_size, stride=1, `padding=0`, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, name=None,`data_format="NCHW"`)	1)`padding`:支持非对称padding和"SAME" "VALID"; (2)新增`data_format` 参数，可支持NDHWC格式	使用 `data_format`指定数据格式
6	paddle.fluid.layers.conv3d_transpose(input, num_filters, filter_size, stride=1, `padding=0`, dilation=1, groups=None, param_attr=None, bias_attr=None, use_cudnn=True, act=None, name=None,`data_format="NCDHW"`)	(1)`padding`:支持非对称padding和"SAME" "VALID"; (2)新增`data_format` 参数，可支持NDHWC格式	使用 `data_format`指定数据格式

NHWC如果mkldnn不能支持，需要强制报错
在1.6.2版本中发布此增强功能。

参考文档

Op升级工作概况.pdf

luotao1 commented 4 years ago

@bingyanghuang Please translate it in English, thanks!

bingyanghuang commented 4 years ago

@lidanqing-intel Please help broadcast in Poland team.

luotao1 commented 4 years ago

补充一些信息：

conv升级相关情况.pdf
channel last格式支持涉及Op - AGroup –.pdf
非对称padding涉及的MKLDNN：conv, pool, conv_transpose

luotao1 commented 4 years ago

data_format参数疑问 @jianhang-liu 提出

API输入是什么数据格式，输出也是什么数据格式，因此这里的data_format是表示输入格式还是其他？
如果是表示输入格式，那API对应的C++ Op内部其实就是做个格式检查？
tensor里面是含有layout_信息的，为什么API还需要有这个参数，从tensor层层往下传递不可以么？ https://github.com/PaddlePaddle/Paddle/blob/d3003a16200ee17f004f4f1877a5547fb77f387b/paddle/fluid/framework/tensor.h#L196

解答

这里的data_format是表示输入格式。
- TODO: 针对这块，我们会在文档上明确指出，且指出输出格式是严格和输入格式一致的。 @zhangting2020
API对应的C++ Op内部确实只做格式检查。我们会根据data_format，去检查输入的shape与其他参数的设置是不是符合要求。比如conv，有一项检查就是根据data_format，去取输入的通道维度大小，判断通道维度的大小与卷积核的in_channel是否一致。 https://github.com/PaddlePaddle/Paddle/blob/4922eb6da5b651f4457f9ebf526a065e68de0e65/paddle/fluid/operators/conv_op.cc#L83-L90
- TODO：我们会增强所有相关Op在此处的报错，并且给出提示，即可能是NHWC或NCHW造成的出错。 @zhangting2020
经和 @lanxianghit 讨论，API需要这个参数的原因有以下几个：
- 为了区分它的输入数据到底是什么格式。
- 原来因为只有一种NCHW格式，所以可以层层往下传递。但现在有两种格式了，必然需要一个地方来进行判断。
- 如果不加在conv, pool等API中，那必须加在fluid.layers.data里面。但加在data里面，有两个问题：
  - data不应该有data_format这个概念，因为很多输入数据是文本类的。
  - 有些网络，比如OCR_Recognition，前半段是RNN，后半段是CNN，即NCHW格式的输出是在网络中间产生的。

zhangting2020 commented 4 years ago

API对应的C++ Op内部确实只做格式检查。我们会根据data_format，去检查输入的shape与其他参数的设置是不是符合要求。比如conv，有一项检查就是根据data_format，去取输入的通道维度大小，判断通道维度的大小与卷积核的in_channel是否一致。

这里描述有误差，修改如下：

API对应的C++ Op内部确实只做格式检查。我们会根据data_format，去检查输入的shape与其他参数的设置是不是符合要求。比如conv，有一项检查就是根据data_format，去取输入的通道维度大小，判断通道维度的大小与卷积核的channels * groups是否一致。

bingyanghuang commented 4 years ago

@luotao1 Could you provide us some unit tests you used for checking data_format support?

luotao1 commented 4 years ago

conv_transpose: #20072
group_norm: #19614
interpolate: #19914

zhangting2020 commented 4 years ago

image_classification.tar.gz 参考源码：https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification 使用NHWC格式输入，涉及的修改：参考该链接查看代码diff

models/mobilenet_v1.py: 网络各层的数据格式设置为NHWC
reader.py: 231-233行，读取的img为HWC格式，不需transpose
增加了train.sh文件，训练的image_shape设置为224, 224, 3，data_dir设置为自己的路径。可以直接执行该脚本开始训练

bingyanghuang commented 4 years ago

@jczaja and @grygielski Please follow this issue and update your progress in this issue.

bingyanghuang commented 4 years ago

image_classification.tar.gz Source code：https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification Use NHWC as the input，modifications include：

models/mobilenet_v1.py: data format of each layers are set to be NHWC
- reader.py: line 231-233，load img as HWC，don't need to transpose
- Add train.sh，training with image_shape 224, 224, 3，data_dir you can set by yourself. You can use this script to start the training directly. @jczaja @grygielski

grygielski commented 4 years ago

Asymetric padding related PR: https://github.com/PaddlePaddle/Paddle/pull/21062

jczaja commented 4 years ago

@zhangting2020 , @bingyanghuang I have a problem running provided : image_classification example.

Command: ./train.sh

Error log: _TypeError: 'set_shape(): incompatible function arguments. The following argument types are supported:\n Invoked with: <paddle.fluid.coreavx.VarDesc object at 0x7f714561fc70>, [None, 224, 224, 3]'

Problem description:

self.desc.set_shape(shape) where shape is [None, 224, 224, 3] , while set_shape's signature is void VarDesc::SetShape(const std::vector<int64_t> &dims) , so None does not fit into std::vector<int64_t>
[None, 224, 224, 3] shape is created here : https://github.com/PaddlePaddle/models/blob/e693a68593810788072590aacfe03643a0098c6c/PaddleCV/image_classification/utils/utility.py#L308

Modification I made:

setuse_gpu=False
disabled check_version() # it was failing on most recent Paddle develop I used (paddle --version, return 0.0.0 when build from source)
addeddata_dir with path to imagenet root dir

environment used:

Centos 7.5
GCC 7.3
python 2.75

Please advice how to have this issue fixed.

zhangting2020 commented 4 years ago

@jczaja I tried the following settings but no error occurred. When I enable check_version(), it can also work. I'm not sure if the latest develop made some changes that caused the error.

use_gpu=False
disabled check_version()
used Paddle develop but not most recent

I will try the latest develop. Before I get the conclusion, you can try the following modifications in utils/utility.py:

    feed_image = fluid.data(
        name="feed_image",
        shape=[args. batch_size] + image_shape,
        dtype="float32",
        lod_level=0)

    feed_label = fluid.data(
        name="feed_label", shape=[args. batch_size, 1], dtype="int64", lod_level=0)
    feed_y_a = fluid.data(
        name="feed_y_a", shape=[args. batch_size, 1], dtype="int64", lod_level=0)

zhangting2020 commented 4 years ago

@jczaja I tried the latest Paddle develop and it can work. It doesn't seems that the latest Paddle develop causes the error. Please follow my advice above.

environment used:

ubuntu 16.04
gcc 5.4.0
python 2.7.12

jczaja commented 4 years ago

@zhangting2020 Thanks for suggestion.

Regarding NHWC data_format, It seems that Paddle CPU ops were extended to support NHWC data_format e.g. Pooling implementation tailored for NHWC was added and common functions like: InferShape were extended to support data_format. For MKL-DNN kernels this approach is not suitable. MKL-DNN memory objects are always using NCHW dims description even if format underneath is NHWC, NCHW16C etc. Hence InferShape cannot consider _dataformat attrib value.

Things to do: 1) Extend InferShape with checking if Kernel is MKL-DNN. If positive , ignore _dataformat e.g. Compute shape for NCHW. I'm planning to add function OperatorWithKernel::IsMKLDNNType() to get info if op's kernel is MKL-DNN 2) When converting Tensor from non MKL-DNN to MKL-DNN In DataTransform , _dataformat will be read and based on that Tensor's mkl-dnn format will be set . Additionaly reshape will be done from NHWC to NCHW. This will happen here: https://github.com/PaddlePaddle/Paddle/blob/78cc1ca6164ecd2aa841068195c0e0ab0c3070c9/paddle/fluid/framework/data_transform.cc#L51-L60 . I do not know how to get _dataformat at pointed location so please advice. 3) When converting Tensor from MKL-DNN to non MKL-DNN. If operator is having _dataformat then it value will determine target data format to convert to e.g. NCHW or NHWC from MKL-DNN formats e.g. (NCWH16C, NHWC, NCHW .... ). There maybe situation that operator after which coversion to paddle data_formats happen does not have _dataformat then should We convert to data_format of last op that had this attribute value ?? For example let assume there is model: data op , conv(mkldnn) , lrn(mkldnn), relu(mldnn), fetch op In between of data and conv there will be conversion from NHWC to MKL-DNN(NCHW16C) format , then conv, lrn and relu will work on NCHW16C and once relu is done there should be conversion to NHWC (as well as reshape). But Relu does not have _dataformat . So how to get information that model is expecting output in NHWC ?

To summarize, please advice on points 2 and 3 and share your opinion on presented approach.

zhangting2020 commented 4 years ago

对于points 2 ：是否可以考虑在conv_op.cc 中覆盖下面的GetKernelTypeForVar方法。 https://github.com/PaddlePaddle/Paddle/blob/35f17ae28f292a602274a93d0398286c3b0f1afb/paddle/fluid/framework/operator.cc#L1199-L1203 覆盖该方法时，可以获取conv_op的data_fromat属性，如果Kernel是MKL-DNN，返回OpKernelType如下，其他情况返回的OpKernelType与默认的GetKernelTypeForVar方法保持一致。

const std::string data_format = ctx->Attrs().Get<std::string>("data_format");
return OpKernelType(expected_kernel_type.data_type_, tensor.place(), 
                         framework::StringToDataLayout(data_format));

那么，在TransformData中，就可以根据lin的值去做NHWC到NCHW的reshape。 https://github.com/PaddlePaddle/Paddle/blob/78cc1ca6164ecd2aa841068195c0e0ab0c3070c9/paddle/fluid/framework/data_transform.cc#L40

对于points 3：是否可以考虑在MKLDNNDeviceContext中缓存一个最近的data_format，MKL-DNN到non MKL-DNN的转换，根据缓存的这个data_format进行。

zhangting2020 commented 4 years ago

@jczaja Thank you for sharing your points. About what you mentioned, my consideration is as follows:

For points 2, The following GetKernelTypeForVar method can be overridden in conv_op.cc https://github.com/PaddlePaddle/Paddle/blob/35f17ae28f292a602274a93d0398286c3b0f1afb/paddle/fluid/framework/operator.cc#L1199-L1203

In this method, the data_fromat attribute of conv_op can be obtained. If the Kernel is MKL-DNN, the OpKernelType is returned as follows. The OpKernelType returned in other cases is consistent with the default GetKernelTypeForVar method.

const std::string data_format = ctx->Attrs().Get<std::string>("data_format");
return OpKernelType(expected_kernel_type.data_type_, tensor.place(), 
                         framework::StringToDataLayout(data_format));

Then, in TransformData, reshape form NHWC to NCHW will be done according to the value of lin. https://github.com/PaddlePaddle/Paddle/blob/78cc1ca6164ecd2aa841068195c0e0ab0c3070c9/paddle/fluid/framework/data_transform.cc#L40

For points 3, a recent data_format can be cached in MKLDNNDeviceContext, so that the conversion of MKL-DNN to non MKL-DNN can be performed according to the cached data_format.

jczaja commented 4 years ago

@zhangting2020 , @luotao1 I apologize for lack of update on this issue for last couple of days.

To add _dataformat NHWC/NDHWC support for conv, conv_transpose, batch_norm and lrn are quite complex hence it would safer is to disable NHWC for mkl-dnn kernels for 1.6 release. Please review PR #21191 for 1.6 with error messages when MKL-DNN kernel that is having data_format set .
For current develop. There is PR #21192 that is enabling NHWC support for pooling op (only FWD). It was only tested on provided unit tests so more testing is needed, but I would like PaddlePaddle developers to take a look and express an opinion on it. If you think this approach is fine then once I finish more tests then it could be merged and then carry on with adapting other ops (pool grad, conv, conv_transpose...)
To have grad pool done I would like you to write if I can assume that input to grad e.g. _OutGrad tensor is also to have NHWC data format if only pool op was defined with data_format="NHWC" ?
For conv and conv_transpose ops : if _dataformat="NHWC" does it mean that params (weights and bias) are also having channel_last arranged data (OHWC ) ?

zhangting2020 commented 4 years ago

@jczaja

3. To have grad pool done I would like you to write if I can assume that input to grad e.g. _OutGrad tensor is also to have NHWC data format if only pool op was defined with data_format="NHWC" ?

Yes, the format of input to grad e.g. Out_Grad tensor is consistent with data_format.

4. For conv and conv_transpose ops : if _dataformat="NHWC" does it mean that params (weights and bias) are also having channel_last arranged data (OHWC ) ?

No matter what the data_format is, the shape of params does not change. It means the shape of weights will always be [O, C, H, W] and bias [O,1],

jczaja commented 4 years ago

@zhangting2020 So for input X (output of previously executed op or shape of incoming signal) shape does depend on _dataformat. if we have _dataformat="NCHW" and shape is [2,3,7,7] then for same op and _dataformat="NHWC" shape is [2,7,7,3] . But for param(weights and bias) inputs situation is diffrent eg. for regardless _dataformat="NCHW" or "NHWC" it is always the same shape for example [96,3,2,2] ?

Some update from Today: 1) Error reporting PR was rebased to develop: #21207 2) NHWC support for LRN FWD was implemented & tested on created UT. PR #21192 combines Pool and LRN FWD.

Next Steps: 1) Maintain PRs 2) Once I know on how to treat params(weights and biases) I will enable FWD ops of conv and conv_transpose 3) When FWD Ops are working then I will investigate on enabling mkl-dnn grad ops.

zhangting2020 commented 4 years ago

@jczaja Sorry to reply to you so late.

But for param(weights and bias) inputs situation is diffrent eg. for regardless _dataformat="NCHW" or "NHWC" it is always the same shape for example [96,3,2,2] ?

yes, as you described.

jczaja commented 4 years ago

Some update:

1) Still trying to pass CI with PR implementing error meesages for NHWC request to MKL-DNN ops 2) Currently implementing conv FWD mkl-dnn NHWC support. Once conv FWD NHWC works, I will update #21192 to have pool, LRN and conv FWD all together.

jczaja commented 4 years ago

One thing that popped up when implementing conv MKL-DNN op NHWC support: data_format attribute was present in conv_mkldnn for a long time and its meaning is different to what is expected from this task (introducing NHWC support) which is to enforce which MKL-DNN implementation is to be chosen. So if data_format is "NHWC" then MKL-DNN instead of using its efficient NCHW16C implementation will convert data to NHWC and execute convolution in this format as well. This behaviour is different to what we want with current data_format reading, where "NHWC" or "NCHW" just determine what is input data arrangement as it comes into model and what should be arrangement of output data from the model. I disscussed with @Sand3r- and we decided that I will align data_format reading with concept realised in this thread e.g. conv mkl-dnn kernel will not consider data_format as a way to determine which mkl-dnn implementation to execute. As a result we need to find a way to test our NCHW16C integration paths for the sake of coverage and unit-test based functionality testing.

jczaja commented 4 years ago

@luotao1 , @zhangting2020 I have encounter a problem and would like you to advice me on possible solution:

When MKL-DNN op is followed by non MKL-DNN op then based on stored information of data_format we do conversion from MKL-DNN to NCHW or NHWC . This happens in DataTransform & Fetch Op. Problem is in unit tests of grad op where to compute numeric grad number of FWD ops on perturbed input is executed. Problem is that those ops are executed directly without Executor , so if FWD op is MKL-DNN kind then there is no fetch op following FWD op and no conversion from MKL-DNN format to NCHW, which may result in badly numerical grad computation for mkl-dnn Unit tests. Code I'm describing (execution of FWD op without Executor and fetch op to do back conversion): https://github.com/PaddlePaddle/Paddle/blob/69dd5152cff75b1f595952cb9360e25739966150/python/paddle/fluid/tests/unittests/op_test.py#L78 How do you recommend to solve this problem?

luotao1 commented 4 years ago

How do you recommend to solve this problem?

@jianhang-liu Could you help see this problem?

zhangting2020 commented 4 years ago

@jczaja I think I am a bit confused. After the calculation is completed, the result needs to be converted from MKL-DNN format to NCHW. Does this mean that the input is in NCHW format? For the existing NCHW format support, the numerical grad computation for mkl-dnn Unit tests should be correct.

jczaja commented 4 years ago

@zhangting2020 Ok, I try to clarify. Setting data_format=NHWC to op that is having MKL-DNN kernel does not mean (according my current implementation) that MKL-DNN kernel will operate (compute) using tensor's data stored in NHWC. Reason is that performance of this solution would be slow as computation on MKL-DNN blocked format (NCHW16C, NCHW8C) are much faster. So Part of implemention :

After MKL-DNN kernel finish computing and next operator is non-MKLDNN (fetch op or any other) then there is conversion to NHWC from MKL-DNN format. If after MKL-DNN kernel there is another MKL-DNN kernel then no conversions to NHWC happening in between. Only when next one is non-MKLDNN then conversion(reorder) to NHWC happens.

This works fine for models executed via executor or Analysis predictor as there is fetch op at the end, but I realised that it does not work when we directly execute operator e.g. op->Run() . So this situation is not supported. My question was if you advice how to handle this situation. I'm quite sure that I could check if MKL-DNN operator is the only/final one or check if executor is present and based on that trigger conversion inside MKL-DNN kernel if there is no more operators to be executed.

zhangting2020 commented 4 years ago

@jczaja Thank you for your explanation. Please try the user_defined_grads to get the correct numerical grad computation, which can refer to https://github.com/PaddlePaddle/Paddle/blob/08c19c585d8ad366a56af9ea97669ed6f164cc3d/python/paddle/fluid/tests/unittests/test_cvm_op.py#L82-L88

You can write the user_defined_grads method based on the existing get_numeric_gradient, in which the op is run by executor instead of op->Run() so that the result is converted from MKL-DNN format to NHWC.

jczaja commented 4 years ago

@zhangting2020 , @luotao1 I have implemented NHWC support to FWD ops: LRN, pool, conv2d and conv2d_transpose . It was tested only on provided unit tests. It is available at: #21192.

1) If you have an inference NHWC workload then please test against #21192. Please report problems if any.

jczaja commented 4 years ago

Subset of #21192 was reformed as PR #21375 (to fit into limit of modified files per PR). Please review. Conv and conv_transpose will be set up as PR once LRN and pool are reviewed and merged. On development side, I will start to enable grad ops.

jczaja commented 4 years ago

@zhangting2020 , @luotao1 I have a question regarding GRAD ops.

I'm working on enabling pool grad unit tests. Sequence of ops in executor is: feed -> pool2d -> mean -> fill_constant -> mean_grad -> pool2d_grad Output from pool2d is named "Out" and One of inputs to pool2d_grad is named "Out@GRAD". Question: 1) How and Where (in code) is "Out@Grad" created (in particulat setting its shape) ?

zhangting2020 commented 4 years ago

@jczaja Here get_grad_op_desc is called https://github.com/PaddlePaddle/Paddle/blob/add62acfd1d7e8f14ceba22b1c049c25027e760e/python/paddle/fluid/backward.py#L797-L798 Then GradOpMaker returns grad_opmaker of the pool2d_grad https://github.com/PaddlePaddle/Paddle/blob/630be31952f2216009851a31b55a3243c2b57f77/paddle/fluid/pybind/pybind.cc#L1035-L1043 pool2d_grad uses the DefaultGradOpMaker, so I think the "Out@GRAD" is create here https://github.com/PaddlePaddle/Paddle/blob/630be31952f2216009851a31b55a3243c2b57f77/paddle/fluid/framework/grad_op_desc_maker.h#L216

And the shape is set here https://github.com/PaddlePaddle/Paddle/blob/630be31952f2216009851a31b55a3243c2b57f77/python/paddle/fluid/backward.py#L922-L923

Hope it helps.

jczaja commented 4 years ago

Thanks to @luotao1 I realised that NHWC support (merged and in PRs) is not implemented to batch_norm of mkl-dnn this is my mistake as attribute is named data_layout not data_format as in other ops, so I missed this when grepping through the code. I will implement NHWC batch norm before continuing work on grad support.

jczaja commented 4 years ago

Current status: All requested FWD ops should have NHWC support now e.g. conv, conv_transpose, LRN, pool2d, batch_norm. I working on enabling GRAD ops for NHWC .

jczaja commented 4 years ago

@zhangting2020 , I would like you advise me on following problem. Below are two pictures showing simple training procedure (a bit extended unit test). Problem is that MeanGrad operator is inferring its output based on output from pool2d operator: https://github.com/PaddlePaddle/Paddle/blob/9a4dd1bc25c9a023e2dd5f4b6f5a415b22ed488a/paddle/fluid/operators/mean_op.cc#L61

Output of pool2d is MKL-DNN layout and input to mean is NHWC layout, bot represent the same data, but dims order is different. Visual scheme is like this:

nhwc-pool-grad-before

while it would be convenient if Mean grad::InferShape is using actual input to Mean Op , like that:

nhwc-pool-grad-after

How can It be done?

zhangting2020 commented 4 years ago

@jczaja For the execution process of mean grad op, my understanding is： When layout of the input that is the output of pool2d is different from the layout of the kernel type, data layout transform will occur first. Then mean grad op infers shape according to the transformed input. I have a question: Why does the input of mean grad op come directly from the untransformed output of pool2d?

jczaja commented 4 years ago

@zhangting2020 I investigated why wrong Tensor/Variable is taken to Mean Grad Op. Reason is that MeanGrad is declaring "X" input as not needing buffer: https://github.com/PaddlePaddle/Paddle/blob/d528ffaa040e9d64f1b97662e9ff297756921b88/paddle/fluid/operators/mean_op.cc#L89

This cause that Data Transform will not Analysis of This input before Mean Grad Op: https://github.com/PaddlePaddle/Paddle/blob/d528ffaa040e9d64f1b97662e9ff297756921b88/paddle/fluid/framework/operator.cc#L1105-L1109

which in particular means that no reshape of dimensions will happen. So, kMKLDNN layout is describing dims in NCHW order , but for Paddle Operator that is no MKL-DNN we need dims described in NHWC order when data layout: kNHWC is used.

I tried to solutions to this problem: 1) Modified code in DataTransform to look into Vars that are having no buffer Vars , just to create reshaped Tensor and register in new scope this new variable. 2) Remove declaration of mean grad op that "X" does not need buffer.

The better solution is option: 1 , but implementation was a bit to complex and I decided to implement second option. Feel free to comment on that and perhaps suggesting some better approach.

luotao1 commented 4 years ago

I decided to implement the second option

@sneaxiy Could we remove declaration of mean grad op that "X" does not need buffer?

sneaxiy commented 4 years ago

Modified code in DataTransform to look into Vars that are having no buffer Vars , just to create reshaped Tensor and register in new scope this new variable.

Remove declaration of mean grad op that "X" does not need buffer.

The better solution is option: 1 , but implementation was a bit to complex and I decided to implement second option.

@jczaja @luotao1 I do not think that the second option is a common solution. If we remove DECLARE_NO_NEED_BUFFER_VARS_INFERENCE of mean_grad op in this case, we would remove all DECLARE_NO_NEED_BUFFER_VARS_INFERENCE of all ops in the future once the similar situation occurs. DECLARE_NO_NEED_BUFFER_VARS_INFERENCE is designed to save memory consumption of models, and it is a very important feature of memory optimization strategy in PaddlePaddle. We should not remove them easily. I prefer the first option, which is a more common solution and would keep DECLARE_NO_NEED_BUFFER_VARS_INFERENCE of all ops.

sfraczek commented 3 years ago

Hi,

I was experimenting with enabling mkldnn in dygraph and there is an issue.

const std::string data_format = ctx->Attrs().Get<std::string>("data_format");

There is no ctx object in GetKernelTypeForVar. The actual code in develop looks like this: https://github.com/PaddlePaddle/Paddle/blob/d0a921ba98b6cc3012e04606f7846db378eb4275/paddle/fluid/operators/conv_op.cc#L195-L198

The this object has no attributes in dygraph mode. This causes crash when trying to access "data_format" in empty AttributeMap. It works in regular mode.

PaddlePaddle / Paddle

conv、conv_transpose and pool support NHWC and asymmetric padding in MKLDNN #20964

背景

需求

参考文档

解答