【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

http://www.paddlepaddle.org/

Apache License 2.0

21.66k stars 5.44k forks source link

【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel #59847

Closed megemini closed 4 months ago

megemini commented 5 months ago

PR types

New features

PR changes

APIs

Description

RFC: https://github.com/PaddlePaddle/community/pull/698

RFC V2.1： https://github.com/PaddlePaddle/community/pull/798

关联 PR：https://github.com/PaddlePaddle/Paddle/pull/59130

新建算子重新实现 api ～

paddle-bot[bot] commented 5 months ago

你的PR提交成功，感谢你对开源项目的贡献! 请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。 Your PR has been submitted. Thanks for your contribution! Please wait for the result of CI firstly. See Paddle CI Manual for details.

megemini commented 5 months ago

Update 20231212

@Charles-hit

本 PR 针对 https://github.com/PaddlePaddle/Paddle/pull/59130 中提出的兼容问题，重新将 fractional max pooling 2d/3d 分别实现了两个 kernel，主要区别为：

重新实现两个 kernel：fractional_max_pool2d_with_index, fractional_max_pool3d_with_index，以及相关 grad。
所有 kernel 相关的地方都单独实现，不干涉 max_pool2d_with_index, max_pool3d_with_index。
只保留 fractional max pooling 相关的两个参数：output_size 与 random_u (由于 kernel_size 是动态计算的，因此这里直接使用 output_size 而不是原来的 ksize 参数)。
只保留 fractional max pooling 相关逻辑，剔除掉原来 max_poolxd_with_index 中关于 max pooling 与 adaptive max pooling 的部分。
不实现 xpu 部分。由于之前的 PR 是复用 max_poolxd_with_index ，所以需要实现 xpu 部分的签名，而现在是重新实现 kernel ，所以不再需要 xpu 部分了。
单独测试算子 fractional_max_pool2d_with_index, fractional_max_pool3d_with_index ，继承 OpTest 。

具体涉及文件：

paddle/phi/api/yaml/backward.yaml ：反向算子描述
paddle/phi/api/yaml/op_compat.yaml ：兼容算子的参数
paddle/phi/api/yaml/ops.yaml ：前向算子描述
paddle/phi/infermeta/backward.cc ：反向算子
paddle/phi/infermeta/backward.h ：反向算子
paddle/phi/infermeta/unary.cc ：算子 InferMeta
paddle/phi/infermeta/unary.h ：算子 InferMeta
paddle/phi/kernels/cpu/pool_grad_kernel.cc ：注册算子
paddle/phi/kernels/cpu/pool_kernel.cc ：注册算子
paddle/phi/kernels/funcs/pooling.cc ：实现 cpu 算子
paddle/phi/kernels/funcs/pooling.cu ：实现 gpu 算子
paddle/phi/kernels/funcs/pooling.h ：添加 fractional max pooling 计算 index 的算法
paddle/phi/kernels/gpu/pool_grad_kernel.cu ：注册算子
paddle/phi/kernels/gpu/pool_kernel.cu ：注册算子
paddle/phi/kernels/impl/pool_grad_kernel_impl.h ：头文件
paddle/phi/kernels/impl/pool_kernel_impl.h ：头文件
paddle/phi/kernels/pool_grad_kernel.h ：头文件
paddle/phi/kernels/pool_kernel.h ：头文件
python/paddle/nn/init.py ：添加 api
python/paddle/nn/functional/init.py ：添加 api
python/paddle/nn/functional/pooling.py ：实现 fractional_max_pool2d， fractional_max_pool3d
python/paddle/nn/layer/init.py ：添加 api
python/paddle/nn/layer/pooling.py ：实现 FractionalMaxPool2D， FractionalMaxPool3D
test/legacy_test/test_fractional_max_pool2d_api.py ：测试 2d api
test/legacy_test/test_fractional_max_pool2d_op.py ：测试 2d 算子
test/legacy_test/test_fractional_max_pool3d_api.py ：测试 3d api
test/legacy_test/test_fractional_max_pool3d_op.py ：测试 3d 算子
test/white_list/op_accuracy_white_list.py ：参考 max_poolxd_with_index 添加算子精度白名单
test/white_list/op_threshold_white_list.py：参考 max_poolxd_with_index 添加算子精度白名单

目前相关算子与 api 在本地（ubuntu）已经测试通过，CI 中大部分已经通过，只有 windows 相关的几个好像有问题，不知道是不是这几天 windows ci 有问题？另外，windows openblas 的问题：

2023-12-12 14:17:36 ======================================================================
2023-12-12 14:17:36 FAIL: test_check_grad (test_fractional_max_pool3d_op.TestMaxPoolWithIndex_Op)
2023-12-12 14:17:36 ----------------------------------------------------------------------
2023-12-12 14:17:36 Traceback (most recent call last):
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\test_fractional_max_pool3d_op.py", line 175, in test_check_grad
2023-12-12 14:17:36     self.check_grad({'X'}, ['Out'])
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2969, in check_grad
2023-12-12 14:17:36     self.check_grad_with_place(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 3242, in check_grad_with_place
2023-12-12 14:17:36     self._assert_is_close(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2926, in _assert_is_close
2023-12-12 14:17:36     self.assertLessEqual(max_diff, max_relative_error, err_msg())
2023-12-12 14:17:36 AssertionError: 1.0352556758097368e+23 not less than or equal to 0.005 : Operator fractional_max_pool3d_with_index error, Gradient Check On Place(cpu) variable X (shape: (2, 3, 7, 7, 7), dtype: float64) max gradient diff 1.035256e+23 over limit 5.000000e-03, the first error element is 0, expected 6.172840e-03, but got 6.390467e+20.
2023-12-12 14:17:36 ----------------------------------------------------------------------

感觉像是 openblas 有问题？openblas 中 cpu 的 float64 是不是实现有点问题，好像其他的 pr 中也有类似的问题？参考这个 issue：https://github.com/PaddlePaddle/Paddle/issues/55707

另外，如果这个 PR 可以的话，原来的 PR ： https://github.com/PaddlePaddle/Paddle/pull/59130 关掉？

请评审～非常感谢！

Charles-hit commented 5 months ago

Update 20231212

@Charles-hit

本 PR 针对 #59130 中提出的兼容问题，重新将 fractional max pooling 2d/3d 分别实现了两个 kernel，主要区别为：

重新实现两个 kernel：fractional_max_pool2d_with_index, fractional_max_pool3d_with_index，以及相关 grad。

所有 kernel 相关的地方都单独实现，不干涉 max_pool2d_with_index, max_pool3d_with_index。

只保留 fractional max pooling 相关的两个参数：output_size 与 random_u (由于 kernel_size 是动态计算的，因此这里直接使用 output_size 而不是原来的 ksize 参数)。

只保留 fractional max pooling 相关逻辑，剔除掉原来 max_poolxd_with_index 中关于 max pooling 与 adaptive max pooling 的部分。

不实现 xpu 部分。由于之前的 PR 是复用 max_poolxd_with_index ，所以需要实现 xpu 部分的签名，而现在是重新实现 kernel ，所以不再需要 xpu 部分了。

单独测试算子 fractional_max_pool2d_with_index, fractional_max_pool3d_with_index ，继承 OpTest 。

具体涉及文件：

paddle/phi/api/yaml/backward.yaml ：反向算子描述

paddle/phi/api/yaml/op_compat.yaml ：兼容算子的参数

paddle/phi/api/yaml/ops.yaml ：前向算子描述

paddle/phi/infermeta/backward.cc ：反向算子

paddle/phi/infermeta/backward.h ：反向算子

paddle/phi/infermeta/unary.cc ：算子 InferMeta

paddle/phi/infermeta/unary.h ：算子 InferMeta

paddle/phi/kernels/cpu/pool_grad_kernel.cc ：注册算子

paddle/phi/kernels/cpu/pool_kernel.cc ：注册算子

paddle/phi/kernels/funcs/pooling.cc ：实现 cpu 算子

paddle/phi/kernels/funcs/pooling.cu ：实现 gpu 算子

paddle/phi/kernels/funcs/pooling.h ：添加 fractional max pooling 计算 index 的算法

paddle/phi/kernels/gpu/pool_grad_kernel.cu ：注册算子

paddle/phi/kernels/gpu/pool_kernel.cu ：注册算子

paddle/phi/kernels/impl/pool_grad_kernel_impl.h ：头文件

paddle/phi/kernels/impl/pool_kernel_impl.h ：头文件

paddle/phi/kernels/pool_grad_kernel.h ：头文件

paddle/phi/kernels/pool_kernel.h ：头文件

python/paddle/nn/init.py ：添加 api

python/paddle/nn/functional/init.py ：添加 api

python/paddle/nn/functional/pooling.py ：实现 fractional_max_pool2d， fractional_max_pool3d

python/paddle/nn/layer/init.py ：添加 api

python/paddle/nn/layer/pooling.py ：实现 FractionalMaxPool2D， FractionalMaxPool3D

test/legacy_test/test_fractional_max_pool2d_api.py ：测试 2d api

test/legacy_test/test_fractional_max_pool2d_op.py ：测试 2d 算子

test/legacy_test/test_fractional_max_pool3d_api.py ：测试 3d api

test/legacy_test/test_fractional_max_pool3d_op.py ：测试 3d 算子

test/white_list/op_accuracy_white_list.py ：参考 max_poolxd_with_index 添加算子精度白名单

test/white_list/op_threshold_white_list.py：参考 max_poolxd_with_index 添加算子精度白名单

目前相关算子与 api 在本地（ubuntu）已经测试通过，CI 中大部分已经通过，只有 windows 相关的几个好像有问题，不知道是不是这几天 windows ci 有问题？另外，windows openblas 的问题：
2023-12-12 14:17:36 ======================================================================
2023-12-12 14:17:36 FAIL: test_check_grad (test_fractional_max_pool3d_op.TestMaxPoolWithIndex_Op)
2023-12-12 14:17:36 ----------------------------------------------------------------------
2023-12-12 14:17:36 Traceback (most recent call last):
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\test_fractional_max_pool3d_op.py", line 175, in test_check_grad
2023-12-12 14:17:36     self.check_grad({'X'}, ['Out'])
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2969, in check_grad
2023-12-12 14:17:36     self.check_grad_with_place(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 3242, in check_grad_with_place
2023-12-12 14:17:36     self._assert_is_close(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2926, in _assert_is_close
2023-12-12 14:17:36     self.assertLessEqual(max_diff, max_relative_error, err_msg())
2023-12-12 14:17:36 AssertionError: 1.0352556758097368e+23 not less than or equal to 0.005 : Operator fractional_max_pool3d_with_index error, Gradient Check On Place(cpu) variable X (shape: (2, 3, 7, 7, 7), dtype: float64) max gradient diff 1.035256e+23 over limit 5.000000e-03, the first error element is 0, expected 6.172840e-03, but got 6.390467e+20.
2023-12-12 14:17:36 ----------------------------------------------------------------------
感觉像是 openblas 有问题？openblas 中 cpu 的 float64 是不是实现有点问题，好像其他的 pr 中也有类似的问题？参考这个 issue：#55707

另外，如果这个 PR 可以的话，原来的 PR ： #59130 关掉？

请评审～非常感谢！

我看了一下他的好像误差比较小，你这儿看着误差特别大，像是溢出了，要不在CI上打一些日志调试一下？原则上这个单测也要通过的。

megemini commented 5 months ago

Update 20231220

使用 UnchangedInferMeta 代替原算子
注册 cpu 算子支持 float16 (pooling 里面的几个算子有点乱啊，是否可以考虑重构了 ... ... 🤣🤣🤣)
增加 float16/bfloat16 单测
增加 random_u 的范围测试
增加 test_fractional_max_pool2d_op/test_fractional_max_pool3d_op 的 coverage 测试 timeout 这里主要是 ci 中 3d 的算子测试超时了，3d 算子涉及的数据本身比较大，因此增加了 timeout，可否？

目前 CI 主要的测试项已经通过，PR-CI-GpuPS 和 PR-CI-LLM 不清楚为啥挂了，好像不是这两个算子导致的～

之前 windows 的 ci 没过，是不是那几天 windows 的 ci 出啥问题了 ... ...

另外，之前 review 的几个意见已经回复～

@Charles-hit 请评审～

Charles-hit commented 5 months ago

Update 20231220

使用 UnchangedInferMeta 代替原算子

注册 cpu 算子支持 float16 (pooling 里面的几个算子有点乱啊，是否可以考虑重构了 ... ... 🤣🤣🤣)

增加 float16/bfloat16 单测

增加 random_u 的范围测试

增加 test_fractional_max_pool2d_op/test_fractional_max_pool3d_op 的 coverage 测试 timeout 这里主要是 ci 中 3d 的算子测试超时了，3d 算子涉及的数据本身比较大，因此增加了 timeout，可否？

目前 CI 主要的测试项已经通过，PR-CI-GpuPS 和 PR-CI-LLM 不清楚为啥挂了，好像不是这两个算子导致的～

之前 windows 的 ci 没过，是不是那几天 windows 的 ci 出啥问题了 ... ...

另外，之前 review 的几个意见已经回复～

@Charles-hit 请评审～这两个流水线需要重新构建

Charles-hit commented 5 months ago

Update 20231212

@Charles-hit

本 PR 针对 #59130 中提出的兼容问题，重新将 fractional max pooling 2d/3d 分别实现了两个 kernel，主要区别为：

重新实现两个 kernel：fractional_max_pool2d_with_index, fractional_max_pool3d_with_index，以及相关 grad。

所有 kernel 相关的地方都单独实现，不干涉 max_pool2d_with_index, max_pool3d_with_index。

只保留 fractional max pooling 相关的两个参数：output_size 与 random_u (由于 kernel_size 是动态计算的，因此这里直接使用 output_size 而不是原来的 ksize 参数)。

只保留 fractional max pooling 相关逻辑，剔除掉原来 max_poolxd_with_index 中关于 max pooling 与 adaptive max pooling 的部分。

不实现 xpu 部分。由于之前的 PR 是复用 max_poolxd_with_index ，所以需要实现 xpu 部分的签名，而现在是重新实现 kernel ，所以不再需要 xpu 部分了。

单独测试算子 fractional_max_pool2d_with_index, fractional_max_pool3d_with_index ，继承 OpTest 。

具体涉及文件：

paddle/phi/api/yaml/backward.yaml ：反向算子描述

paddle/phi/api/yaml/op_compat.yaml ：兼容算子的参数

paddle/phi/api/yaml/ops.yaml ：前向算子描述

paddle/phi/infermeta/backward.cc ：反向算子

paddle/phi/infermeta/backward.h ：反向算子

paddle/phi/infermeta/unary.cc ：算子 InferMeta

paddle/phi/infermeta/unary.h ：算子 InferMeta

paddle/phi/kernels/cpu/pool_grad_kernel.cc ：注册算子

paddle/phi/kernels/cpu/pool_kernel.cc ：注册算子

paddle/phi/kernels/funcs/pooling.cc ：实现 cpu 算子

paddle/phi/kernels/funcs/pooling.cu ：实现 gpu 算子

paddle/phi/kernels/funcs/pooling.h ：添加 fractional max pooling 计算 index 的算法

paddle/phi/kernels/gpu/pool_grad_kernel.cu ：注册算子

paddle/phi/kernels/gpu/pool_kernel.cu ：注册算子

paddle/phi/kernels/impl/pool_grad_kernel_impl.h ：头文件

paddle/phi/kernels/impl/pool_kernel_impl.h ：头文件

paddle/phi/kernels/pool_grad_kernel.h ：头文件

paddle/phi/kernels/pool_kernel.h ：头文件

python/paddle/nn/init.py ：添加 api

python/paddle/nn/functional/init.py ：添加 api

python/paddle/nn/functional/pooling.py ：实现 fractional_max_pool2d， fractional_max_pool3d

python/paddle/nn/layer/init.py ：添加 api

python/paddle/nn/layer/pooling.py ：实现 FractionalMaxPool2D， FractionalMaxPool3D

test/legacy_test/test_fractional_max_pool2d_api.py ：测试 2d api

test/legacy_test/test_fractional_max_pool2d_op.py ：测试 2d 算子

test/legacy_test/test_fractional_max_pool3d_api.py ：测试 3d api

test/legacy_test/test_fractional_max_pool3d_op.py ：测试 3d 算子

test/white_list/op_accuracy_white_list.py ：参考 max_poolxd_with_index 添加算子精度白名单

test/white_list/op_threshold_white_list.py：参考 max_poolxd_with_index 添加算子精度白名单

目前相关算子与 api 在本地（ubuntu）已经测试通过，CI 中大部分已经通过，只有 windows 相关的几个好像有问题，不知道是不是这几天 windows ci 有问题？另外，windows openblas 的问题：
2023-12-12 14:17:36 ======================================================================
2023-12-12 14:17:36 FAIL: test_check_grad (test_fractional_max_pool3d_op.TestMaxPoolWithIndex_Op)
2023-12-12 14:17:36 ----------------------------------------------------------------------
2023-12-12 14:17:36 Traceback (most recent call last):
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\test_fractional_max_pool3d_op.py", line 175, in test_check_grad
2023-12-12 14:17:36     self.check_grad({'X'}, ['Out'])
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2969, in check_grad
2023-12-12 14:17:36     self.check_grad_with_place(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 3242, in check_grad_with_place
2023-12-12 14:17:36     self._assert_is_close(
2023-12-12 14:17:36   File "C:\home\workspace\Paddle\build\test\legacy_test\op_test.py", line 2926, in _assert_is_close
2023-12-12 14:17:36     self.assertLessEqual(max_diff, max_relative_error, err_msg())
2023-12-12 14:17:36 AssertionError: 1.0352556758097368e+23 not less than or equal to 0.005 : Operator fractional_max_pool3d_with_index error, Gradient Check On Place(cpu) variable X (shape: (2, 3, 7, 7, 7), dtype: float64) max gradient diff 1.035256e+23 over limit 5.000000e-03, the first error element is 0, expected 6.172840e-03, but got 6.390467e+20.
2023-12-12 14:17:36 ----------------------------------------------------------------------
感觉像是 openblas 有问题？openblas 中 cpu 的 float64 是不是实现有点问题，好像其他的 pr 中也有类似的问题？参考这个 issue：#55707

另外，如果这个 PR 可以的话，原来的 PR ： #59130 关掉？

请评审～非常感谢！

还有一个小问题，就是torch其实是有kernel_size参数的，我们这个参数功能是在哪体现的，我看rfc里是写了这个参数的，但是pr里给去掉了。

megemini commented 5 months ago

还有一个小问题，就是torch其实是有kernel_size参数的，我们这个参数功能是在哪体现的，我看rfc里是写了这个参数的，但是pr里给去掉了。

这里之前已经确认过了！ https://github.com/PaddlePaddle/community/pull/698#discussion_r1386019125

这里只需要 output_size 即可，前面也说明了，这两个算子单独提出来之后，用 output_size 代替原来的 ksize 这个参数名～因为没有必要复用这个无意义的名称～

另外，torch 的这种实现方式应该是与原论文有出入的～目前能搜索到的 fractional max pooling 基本都是用 tensorflow 的接口实现的，如 https://github.com/ND15/Fractional-Max-Pooling ～或者自己实现的，如 https://github.com/diogo149/theano_fractional_max_pooling ～不管哪种方式，都没有 kernel_size 这个参数，因为 kernel_size 是程序推算的，而不是用户输入的～所以，从目前能搜索到的资料来看，torch 的这种方式可能有问题～

megemini commented 5 months ago

Update 20231221

去掉对于 win32 的测试限制
单测中手动设置 paddle.set_device

目前 CI 已经通过，之前 windows inference 的问题，由于我这里没有这种测试环境，所以只能推断一下可能的原因：

如果环境中有 gpu，默认创建的 tensor 应该在 gpu 上，而 windows inference 这个环境创建在 cpu 上，这就导致 use_cuda and core.is_bfloat16_supported(place) 中 place 即使为 gpu ，但是仍然用 cpu 上的 tensor 测试，进而出现问题。手动设置 set_device 后，问题排除～
不排除 cuda 版本不同，导致 tensor 创建出错的可能～

@Charles-hit @luotao1 看看还有没有其他问题？！

megemini commented 5 months ago

看了一下覆盖率，random为None的场景没有测试到，可以稍后补上

嗯，确实没写，因为 random_u 为 None 的话，随机结果不好跟 numpy 比对～那我单独测一下是否正常输出吧～

core.is_bfloat16_supported(paddle.CPUPlace()) 这个会返回true，你直接用use_cuda来决定测不测bf16即可。

还是两个都写上吧，至少看上去更明确一点～

megemini commented 5 months ago

Update 20231221

已补充 random_u 为 None 的测试用例

@Charles-hit 请评审～

luotao1 commented 4 months ago

可以提交中文文档

jeff41404 commented 4 months ago

please add an rfc link to the description

megemini commented 4 months ago

please add an rfc link to the description

OK ~

added and here : https://github.com/PaddlePaddle/community/pull/698

Charles-hit commented 4 months ago

@megemini 还有一个小问题，就是torch其实是有kernel_size参数的，我们这个参数功能是在哪体现的，我看rfc里是写了这个参数的，但是pr里给去掉了。

这里之前已经确认过了！ PaddlePaddle/community#698 (comment)

这里只需要 output_size 即可，前面也说明了，这两个算子单独提出来之后，用 output_size 代替原来的 ksize 这个参数名～因为没有必要复用这个无意义的名称～

另外，torch 的这种实现方式应该是与原论文有出入的～目前能搜索到的 fractional max pooling 基本都是用 tensorflow 的接口实现的，如 https://github.com/ND15/Fractional-Max-Pooling ～或者自己实现的，如 https://github.com/diogo149/theano_fractional_max_pooling ～不管哪种方式，都没有 kernel_size 这个参数，因为 kernel_size 是程序推算的，而不是用户输入的～所以，从目前能搜索到的资料来看，torch 的这种方式可能有问题～

@megemini 感谢您的贡献，内部讨论了一下，有两个点需要在修改/论证一下： 1.新增op原则上参数需要跟api参数一致，所以return_mask参数需要放到c++api内部， mask输出可以作为optional形式 2.还是kernel_size的问题，您这边证明的结论是可以不加，但是转换工具需要torch与paddle互相转换，比如下面这段代码：

input = torch.randn(20, 16, 50, 32)
F.fractional_max_pool2d(input, 3, output_size=(13, 12))

那相应paddle用法应该如何表示呢？

megemini commented 4 months ago

1.新增op原则上参数需要跟api参数一致，所以return_mask参数需要放到c++api内部， mask输出可以作为optional形式

收到～帮忙给一个参考的 op 吧，我看看目前都是怎么处理这种 optional 的～谢谢！

那相应paddle用法应该如何表示呢？

这个问题我觉得可以分几个方面来说～

首先，如果 pytorch 这种方式是存在问题的，那么应该不存在一种 合适 的映射方法，换句话说，为什么一定要能够 mapping torch 的接口，甚至它做的可能还有问题？另一方面，tensorflow 的实现我们可以很方便的 mapping，tensorflow 只有一个 pooling_ratio 是必须的参数，我们的 output_size 可以很简单的转换一下～

其次，如果一定要 mapping，不使用 torch 的 kernel_size 参数即可：

torch
input = torch.randn(20, 16, 50, 32)
F.fractional_max_pool2d(input, 3, output_size=(13, 12))

paddle
F.fractional_max_pool2d(input, output_size=(13, 12))

如果担心转换后的结果与 torch 的不一致问题，实际上，这个接口本身存在 random，也很难做到完全一致～因此，我这里参考 torch 的 random sequence 方式用 random_u 参数，这样可以保证后续复现的一致性，而不是使用 tensorflow 的 random seed 的方式，当然，也可以通过 paddle.seed 固定～

最后，不知道您有没有实际用一下 torch 的这个接口，他的结果感觉很不可靠，原因就是 kernel_size 可能大于 stride ，导致很多 pooling 都一样，已经失去了 pooling 的意义，kernel_size 过小更是没有意义～而我比对过 paddle 的实现与 tensorflow 的结果，都是在各个 grid 中 pooling ，结果也相对可靠～方便的话您可以自己验证一下～

Charles-hit commented 4 months ago

@megemini 您好这个样例我尝试测试了一下，通过paddle.seed和torch.manualseed固定seed结果是对不上的，如果是torch的参数有问题（算法本身可能没问题），我们是否可以通过hard code来将他们参数调整成一样，来测试一下结果的一致性，因为后续转换工具将torch与paddle做转换时至少要保证这个算子结果是一致的。输出optional可以参考`rmsprop`算子

megemini commented 4 months ago

@megemini 您好这个样例我尝试测试了一下，通过paddle.seed和torch.manual_seed固定seed结果是对不上的，如果是torch的参数有问题（算法本身可能没问题），我们是否可以通过hard code来将他们参数调整成一样，来测试一下结果的一致性，因为后续转换工具将torch与paddle做转换时至少要保证这个算子结果是一致的。

用 seed 应该不一样吧？！大家的随机数产生机制不一样，而且 cpu 和 gpu 的随机数机制也不一样～～～ 🤨🤨🤨

比如：

In [14]: paddle.seed(2023)
Out[14]: <paddle.fluid.libpaddle.Generator at 0x7fb6bc965a70>

In [15]: paddle.rand((2, 3))
Out[15]: 
Tensor(shape=[2, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.81200254, 0.99631780, 0.51082075],
        [0.43302748, 0.22371240, 0.32796350]])

In [16]: torch.manual_seed(2023)
Out[16]: <torch._C.Generator at 0x7fb77202aa10>

In [17]: torch.rand((2, 3))
Out[17]: 
tensor([[0.4290, 0.7201, 0.9481],
        [0.4797, 0.5414, 0.9906]])

In [19]: paddle.set_device('cpu')
Out[19]: Place(cpu)

In [20]: paddle.seed(2023)
Out[20]: <paddle.fluid.libpaddle.Generator at 0x7fb6bc965a70>

In [21]: paddle.rand((2, 3))
Out[21]: 
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.86583614, 0.52014720, 0.25960937],
        [0.90525323, 0.42400089, 0.40641287]])

用 seed 只能保证自身框架的复现结果一致，跨框架用 seed 也可以保证结果一致？

而且 torch 对于这个算子的测试，pytorch/test/nn/test_pooling.py，他只检查了输出的形状，根本没有验证输出的值是否正确～～～ 🤣🤣🤣

输出optional可以参考rmsprop_算子

这个算子用在 RMSProp 里面没有用到 return 的值～

我搜索了用 optional 的算子，都是输入是 optional 的，输出即使为 optional 的也还是要有，比如：

- op : unique_consecutive
  args : (Tensor x, bool return_inverse = false, bool return_counts = false, int[] axis = {}, DataType dtype = DataType::FLOAT32)
  output : Tensor(out), Tensor(index), Tensor(counts)
  infer_meta :
      func : UniqueConsecutiveInferMeta
  kernel :
    func : unique_consecutive
    data_type : x
  optional : index, counts

def unique_consecutive(
    x,
    return_inverse=False,
    return_counts=False,
    axis=None,
    dtype="int64",
    name=None,
):
...
    if axis is None:
        axis = []
    else:
        axis = [axis]
    attr_dtype = convert_np_dtype_to_dtype_(dtype)
    if in_dynamic_or_pir_mode():
        out, inverse, counts = _C_ops.unique_consecutive(
            x, return_inverse, return_counts, axis, attr_dtype
        )
        outs = [out]
        if return_inverse:
            outs.append(inverse)
        if return_counts:
            outs.append(counts)
        if len(outs) == 1:
            return outs[0]
        return tuple(outs)
...

即使不需要 inverse 结果，算子还是会输出，也就是说，最后还是要用到 python 的判断 🤣🤣🤣

算了，我加上这个参数吧，看看以后会不会用到～

Charles-hit commented 4 months ago

@megemini 您好这个样例我尝试测试了一下，通过paddle.seed和torch.manual_seed固定seed结果是对不上的，如果是torch的参数有问题（算法本身可能没问题），我们是否可以通过hard code来将他们参数调整成一样，来测试一下结果的一致性，因为后续转换工具将torch与paddle做转换时至少要保证这个算子结果是一致的。

用 seed 应该不一样吧？！大家的随机数产生机制不一样，而且 cpu 和 gpu 的随机数机制也不一样～～～ 🤨🤨🤨

比如：
In [14]: paddle.seed(2023)
Out[14]: <paddle.fluid.libpaddle.Generator at 0x7fb6bc965a70>

In [15]: paddle.rand((2, 3))
Out[15]: 
Tensor(shape=[2, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[0.81200254, 0.99631780, 0.51082075],
        [0.43302748, 0.22371240, 0.32796350]])

In [16]: torch.manual_seed(2023)
Out[16]: <torch._C.Generator at 0x7fb77202aa10>

In [17]: torch.rand((2, 3))
Out[17]: 
tensor([[0.4290, 0.7201, 0.9481],
        [0.4797, 0.5414, 0.9906]])

In [19]: paddle.set_device('cpu')
Out[19]: Place(cpu)

In [20]: paddle.seed(2023)
Out[20]: <paddle.fluid.libpaddle.Generator at 0x7fb6bc965a70>

In [21]: paddle.rand((2, 3))
Out[21]: 
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.86583614, 0.52014720, 0.25960937],
        [0.90525323, 0.42400089, 0.40641287]])
用 seed 只能保证自身框架的复现结果一致，跨框架用 seed 也可以保证结果一致？

而且 torch 对于这个算子的测试，pytorch/test/nn/test_pooling.py，他只检查了输出的形状，根本没有验证输出的值是否正确～～～ 🤣🤣🤣

输出optional可以参考rmsprop_算子

这个算子用在 RMSProp 里面没有用到 return 的值～

我搜索了用 optional 的算子，都是输入是 optional 的，输出即使为 optional 的也还是要有，比如：
- op : unique_consecutive
  args : (Tensor x, bool return_inverse = false, bool return_counts = false, int[] axis = {}, DataType dtype = DataType::FLOAT32)
  output : Tensor(out), Tensor(index), Tensor(counts)
  infer_meta :
      func : UniqueConsecutiveInferMeta
  kernel :
    func : unique_consecutive
    data_type : x
  optional : index, counts
def unique_consecutive(
    x,
    return_inverse=False,
    return_counts=False,
    axis=None,
    dtype="int64",
    name=None,
):
...
    if axis is None:
        axis = []
    else:
        axis = [axis]
    attr_dtype = convert_np_dtype_to_dtype_(dtype)
    if in_dynamic_or_pir_mode():
        out, inverse, counts = _C_ops.unique_consecutive(
            x, return_inverse, return_counts, axis, attr_dtype
        )
        outs = [out]
        if return_inverse:
            outs.append(inverse)
        if return_counts:
            outs.append(counts)
        if len(outs) == 1:
            return outs[0]
        return tuple(outs)
...
即使不需要 inverse 结果，算子还是会输出，也就是说，最后还是要用到 python 的判断 🤣🤣🤣

算了，我加上这个参数吧，看看以后会不会用到～

是可以固定的参考以下代码：

import paddle 
import torch
paddle.seed(10)
torch.manual_seed(10)
paddle.set_device("gpu:0")
print(torch.rand((2, 3), device='cuda'))
print(paddle.rand((2, 3)))

输出的结果是一致的。回到刚才的问题，我测试两个框架api结果是这样测试的

import paddle 
import torch
import numpy as np
paddle.seed(10)
torch.manual_seed(10)
np_x = np.random.random(size=(20, 16, 50, 32))
torch_x = torch.tensor(np_x, device='cuda',dtype=torch.float32)
out_torch = torch.nn.functional.fractional_max_pool2d(torch_x, 3, output_size=(13, 12))
pd_x = paddle.to_tensor(np_x, place="gpu:0", dtype="float32")
out_pd = paddle.nn.functional.fractional_max_pool2d(pd_x, output_size=(13, 12))
out_torch_np = out_torch.cpu().detach().numpy()
out_pd_np = out_pd.numpy()
np.testing.assert_allclose(out_torch_np, out_pd_np, atol=0, rtol=1e-6)

megemini commented 4 months ago

是可以固定的参考以下代码：

这是用的 gpu 的随机数～ cpu 上的还是不一样的：🤣🤣🤣

In [15]: import paddle
    ...: import torch
    ...: paddle.seed(2023)
    ...: torch.manual_seed(2023)
    ...: paddle.set_device("cpu")
    ...: print(torch.rand((2, 3), device='cpu'))
    ...: print(paddle.rand((2, 3)))
tensor([[0.4290, 0.7201, 0.9481],
        [0.4797, 0.5414, 0.9906]])
Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
       [[0.86583614, 0.52014720, 0.25960937],
        [0.90525323, 0.42400089, 0.40641287]])

回到刚才的问题，我测试两个框架api结果是这样测试的

算法不一样，结果肯定不一样的～我举个例子吧（也是 pooling.py 里面我写在 fractional_max_pool 里面的示例）：

以 1 维数据为例，比如有个序列 [2, 4, 3, 1, 5, 2, 3]，长度为 7 ，我们需要 output_size 为 5 ，并设置 random_u 为 0.3 ，则有：

a_i = ceiling(α(i + u)), α = 7/5 = 1.4，计算可得起始 index 为 [0, 1, 3, 4, 6]，截至 index 为 [1, 3, 4, 6, 7] ，进而得到论文中的 random sequence 为 [1, 2, 1, 2, 1]，也就是说，把原来的序列以此 random sequence 进行分割为 [2 | 4, 3 | 1 | 5, 2 | 3]。

关键的来了，此时：

方法 1： torch 的做法

利用 kernel_size 进行 pooling，就以你这里的 3 来说吧，则可以得到各个 grid 为：

["2 | 4, 3" | 1 | 5, 2 | 3] -> [2, 4, 3] -> max = 4 [2 | "4, 3 | 1" | 5, 2 | 3] -> [4, 3, 1] -> max = 4 [2 | 4, 3 | "1 | 5, 2" | 3] -> [1, 5, 2] -> max = 5 [2 | 4, 3 | 1 | "5, 2 | 3"] -> [5, 2, 3] -> max = 5 [2 | 4, 3 | 1 | 5, 2 | "3"] -> [3] -> max = 3

最终结果为 [4, 4, 5, 5, 3]

方法 2：tensorflow，theano，paddle 的做法

不使用 kernel_size ，则有： [2 | 4, 3 | 1 | 5, 2 | 3] -> [2] + [4, 3] + [1] + [5, 2] + [3] -> max = [2] + [4] + [1] + [5] + [3]

最终结果为 [2, 4, 1, 5, 3]

看到区别了吧，torch 由于存在 kernel_size ，导致每次 pooling 会跨越多个 grid ，而我们常用的 pooling 算法，比如 MaxPool2D，一般使用的时候，kernel_size 与 strides 是一致的，也就是不存在跨越多个 grid 的情况，因为这样会导致输出严重失真，参考 MaxPool2D 的 strides 参数说明：

stride (int|list|tuple, optional) – ... Default None, then stride will be equal to the kernel_size.

fractional_max_pool 的做法是，strides 与 kernel_size 一致，并且是不断变化的，因此无法设置一个合适的 kernel_size ～～～

不知道有木有解释清楚～～～ 🤣🤣🤣

paddle-ci-bot[bot] commented 4 months ago

Sorry to inform you that 331cd95's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Charles-hit commented 4 months ago

FractionalMaxPool2d

@megemini 您好内部讨论了一下大家还是希望可以有kernel_size参数的，可以默认为None 当前逻辑不变，不为None是选用固定的kernel_size，您看这样可以吗。因为之前也是有类似的情况，最后又重新开发了一个api。

megemini commented 4 months ago

@megemini 您好内部讨论了一下大家还是希望可以有kernel_size参数的，可以默认为None 当前逻辑不变，不为None是选用固定的kernel_size，您看这样可以吗。因为之前也是有类似的情况，最后又重新开发了一个api。

嗯，可以！

这几天我跟 fractional max pool 的作者 Benjamin Graham 咨询了一下，他的回复是：

Hello. My original implementation (for sparse ConvNets) generated regions using this code: https://github.com/btgraham/SparseConvNet-archived/blob/bdde325c28f64b895cebfdbe301a2ddca7870174/SparseConvNet/Regions.cu#L31

可以看一下他的代码，这里只需要关心 PseudorandomXXX 即可：

PseudorandomOverlappingFmpTicks::PseudorandomOverlappingFmpTicks(int nIn,
                                                                 int nOut,
                                                                 int poolSize,
                                                                 RNG &rng) {
  assert(nIn >= nOut - 1 + poolSize);
  double alpha = (nIn - poolSize) * 1.0 / (nOut - 1);
  double u = rng.uniform(0, 1);
  for (int j = 0; j < nOut; ++j) {
    int i = (int)((j + u) * alpha) - (int)(u * alpha);
    inputL.push_back(i);
    inputR.push_back(i + poolSize); // megemini: 这里使用 poolSize
  }
  assert(inputR.back() == nIn);

  // megemini: 后面是比较大小，不需要关注
  outputL.resize(nIn, nOut);
  outputR.resize(nIn, 0);
  for (int i = 0; i < nOut; i++) {
    for (int j = inputL[i]; j < inputR[i]; j++) {
      outputL[j] = std::min(outputL[j], i);
      outputR[j] = std::max(outputR[j], i + 1);
    }
  }
}

PseudorandomNonOverlappingFmpTicks::PseudorandomNonOverlappingFmpTicks(
    int nIn, int nOut, int poolSize, RNG &rng) {
  double alpha = nIn * 1.0 / nOut;
  double u = rng.uniform(0, 1);
  assert(nIn >= nOut - 1 + poolSize);
  assert((int)ceil(alpha) == poolSize);
  for (int j = 0; j < nOut; ++j)
    inputL.push_back((int)((j + u) * alpha) - (int)(u * alpha));
  for (int j = 1; j <= nOut; ++j)
    inputR.push_back((int)((j + u) * alpha) - (int)(u * alpha)); // megemini: 不使用poolSize，跟之前 paddle 的实现方式一样
  assert(inputR.back() == nIn);

  // megemini: 后面是比较大小，不需要关注
  outputL.resize(nIn, nOut);
  outputR.resize(nIn, 0);
  for (int i = 0; i < nOut; i++) {
    for (int j = inputL[i]; j < inputR[i]; j++) {
      outputL[j] = std::min(outputL[j], i);
      outputR[j] = std::max(outputR[j], i + 1);
    }
  }
}

根据他的实现，我画了几张图：

可以看到，他在 PseudorandomOverlappingFmpTicks 中是用了 poolSize 的，而在 PseudorandomNonOverlappingFmpTicks 中却没有使用。

这里跟他论文中对于 disjoint(non-overlapping) 与 overlapping 的描述不一样：

按照他论文里面的说法，overlapping 与 non-overlapping 的唯一区别是，overlapping 的右边界比 non-overlapping 多一。

我问他为什么会这样，他没回复我 ... ...

目前看，torch 是实现了 PseudorandomOverlappingFmpTicks，而 tensorflow 是按照论文的做法实现了不需要 poolSize 的 overlapping 与 non-overlapping。

目前比较好的实现方式是，添加 pool size 这个参数：

None，默认值，使用 non-overlapping 的实现方式，也就是之前实现的方式
整数，使用 overlapping 的实现方式，与 pytorch 保持一致

如何？

Charles-hit commented 4 months ago

@megemini 您好内部讨论了一下大家还是希望可以有kernel_size参数的，可以默认为None 当前逻辑不变，不为None是选用固定的kernel_size，您看这样可以吗。因为之前也是有类似的情况，最后又重新开发了一个api。

嗯，可以！

这几天我跟 fractional max pool 的作者 Benjamin Graham 咨询了一下，他的回复是：

Hello. My original implementation (for sparse ConvNets) generated regions using this code: https://github.com/btgraham/SparseConvNet-archived/blob/bdde325c28f64b895cebfdbe301a2ddca7870174/SparseConvNet/Regions.cu#L31

可以看一下他的代码，这里只需要关心 PseudorandomXXX 即可：
PseudorandomOverlappingFmpTicks::PseudorandomOverlappingFmpTicks(int nIn,
                                                                 int nOut,
                                                                 int poolSize,
                                                                 RNG &rng) {
  assert(nIn >= nOut - 1 + poolSize);
  double alpha = (nIn - poolSize) * 1.0 / (nOut - 1);
  double u = rng.uniform(0, 1);
  for (int j = 0; j < nOut; ++j) {
    int i = (int)((j + u) * alpha) - (int)(u * alpha);
    inputL.push_back(i);
    inputR.push_back(i + poolSize); // megemini: 这里使用 poolSize
  }
  assert(inputR.back() == nIn);

  // megemini: 后面是比较大小，不需要关注
  outputL.resize(nIn, nOut);
  outputR.resize(nIn, 0);
  for (int i = 0; i < nOut; i++) {
    for (int j = inputL[i]; j < inputR[i]; j++) {
      outputL[j] = std::min(outputL[j], i);
      outputR[j] = std::max(outputR[j], i + 1);
    }
  }
}

PseudorandomNonOverlappingFmpTicks::PseudorandomNonOverlappingFmpTicks(
    int nIn, int nOut, int poolSize, RNG &rng) {
  double alpha = nIn * 1.0 / nOut;
  double u = rng.uniform(0, 1);
  assert(nIn >= nOut - 1 + poolSize);
  assert((int)ceil(alpha) == poolSize);
  for (int j = 0; j < nOut; ++j)
    inputL.push_back((int)((j + u) * alpha) - (int)(u * alpha));
  for (int j = 1; j <= nOut; ++j)
    inputR.push_back((int)((j + u) * alpha) - (int)(u * alpha)); // megemini: 不使用poolSize，跟之前 paddle 的实现方式一样
  assert(inputR.back() == nIn);

  // megemini: 后面是比较大小，不需要关注
  outputL.resize(nIn, nOut);
  outputR.resize(nIn, 0);
  for (int i = 0; i < nOut; i++) {
    for (int j = inputL[i]; j < inputR[i]; j++) {
      outputL[j] = std::min(outputL[j], i);
      outputR[j] = std::max(outputR[j], i + 1);
    }
  }
}
根据他的实现，我画了几张图：

可以看到，他在 PseudorandomOverlappingFmpTicks 中是用了 poolSize 的，而在 PseudorandomNonOverlappingFmpTicks 中却没有使用。

这里跟他论文中对于 disjoint(non-overlapping) 与 overlapping 的描述不一样：

按照他论文里面的说法，overlapping 与 non-overlapping 的唯一区别是，overlapping 的右边界比 non-overlapping 多一。

我问他为什么会这样，他没回复我 ... ...

目前看，torch 是实现了 PseudorandomOverlappingFmpTicks，而 tensorflow 是按照论文的做法实现了不需要 poolSize 的 overlapping 与 non-overlapping。

目前比较好的实现方式是，添加 pool size 这个参数：

None，默认值，使用 non-overlapping 的实现方式，也就是之前实现的方式

整数，使用 overlapping 的实现方式，与 pytorch 保持一致

如何？

可以的，命名的话叫kernel_size吧。因为之前paddle的pool相关的api都是叫kernel_size，这儿也叫这个方便用户理解吧。

megemini commented 4 months ago

Update 20240104

增加 kernel_size 参数
增加相应单测
增加 return_mask 参数

如果想要比对与 torch 的结果的话，可以用类似以下测试代码：

import numpy as np
import paddle
import torch

paddle.set_device('cpu')
# paddle.set_device('gpu')

if __name__ == '__main__':
    # input_shape = (1, 2, 12, 12)
    # torch_samples = [[[0.3, 0.3], [0.3, 0.3]]]
    # output_shape = (7, 7)
    # kernel_size = 2

    # input_shape = (1, 2, 27, 37)
    # torch_samples = [[[0.3, 0.3], [0.3, 0.3]]]
    # output_shape = (22, 29)
    # kernel_size = 2

    # input_shape = (1, 1, 5, 5)
    # torch_samples = [[[0.3, 0.3]]]
    # output_shape = (3, 3)
    # kernel_size = 2

    input_shape = (1, 1, 25, 55)
    torch_samples = [[[0.3, 0.3]]]
    output_shape = (13, 23)
    # kernel_size = 2
    kernel_size = [3, 2]

    input = np.random.rand(*input_shape)
    out_torch = torch.nn.functional.fractional_max_pool2d(
        torch.tensor(input), 
        kernel_size, 
        output_size=output_shape, 
        return_indices=True, 
        _random_samples=torch.tensor(torch_samples, dtype=torch.float64))

    out_paddle = paddle.nn.functional.fractional_max_pool2d(
        paddle.to_tensor(input), 
        output_shape, 
        kernel_size=kernel_size, 
        return_mask=True, 
        random_u=0.3)
    out_paddle_non = paddle.nn.functional.fractional_max_pool2d(paddle.to_tensor(input), output_shape, return_mask=True, random_u=0.3)

    print('input...')
    print(input)

    print('torch...')
    print(out_torch)

    print('paddle...')
    print(out_paddle)

    print('paddle non...')
    print(out_paddle_non)

    print('summary...')
    print(out_torch[0].numpy().shape, out_paddle[0].numpy().shape)

    np.testing.assert_allclose(out_torch[0].numpy(), out_paddle[0].numpy())

    print('-'*20)

    input_shape = (1, 1, 5, 5, 5)
    torch_samples = [[[0.3, 0.3, 0.3]]]
    output_shape = (3, 3, 3)

    input = np.random.rand(*input_shape)
    out_torch = torch.nn.functional.fractional_max_pool3d(
        torch.tensor(input), 
        2, 
        output_size=output_shape, 
        return_indices=True, 
        _random_samples=torch.tensor(torch_samples, dtype=torch.float64))

    out_paddle = paddle.nn.functional.fractional_max_pool3d(paddle.to_tensor(input), output_shape, kernel_size=2, return_mask=True, random_u=0.3)
    out_paddle_non = paddle.nn.functional.fractional_max_pool3d(paddle.to_tensor(input), output_shape, return_mask=True, random_u=0.3)

    print('input...')
    print(input)

    print('torch...')
    print(out_torch)

    print('paddle...')
    print(out_paddle)

    print('paddle non...')
    print(out_paddle_non)

    print('summary...')
    print(out_torch[0].numpy().shape, out_paddle[0].numpy().shape)

    np.testing.assert_allclose(out_torch[0].numpy(), out_paddle[0].numpy())

使用 kernel_size 的接口与 torch 一致，cpu/gpu 上都可以进行测试，结果一致～

这里是使用固定参数 u，也就是 torch 的 _random_samples 和 paddle 的 random_u 的方式，而不是固定 seed 的方式～

单纯固定 seed 不能达到预期结果，torch 对于这个随机数的使用有点随意，是每个 batch 每个 channel 一个随机数，我这里只使用一个 random_u ，因此，单纯固定 seed 无法达到一致的结果～固定 seed 的方式可以保证框架内复现的一致性。

另外，windows 的几个 ci 又挂了，构建不起来，无法测试～

@Charles-hit

luotao1 commented 4 months ago

windows 的几个 ci 又挂了，构建不起来，无法测试

https://github.com/PaddlePaddle/Paddle/pull/60528 修好了，可以再rerun下这几条

megemini commented 4 months ago

@luotao1 windows 这几个 ci 还是构建失败～ rerun 了几次不行，拉新代码 merge 了也不行 ... ...

megemini commented 4 months ago

@Charles-hit

windows 的 ci 环境还是有问题，咱们这个算子不涉及跨平台，大概率应该没什么问题，能不能先 review 代码，看看还有什么要讨论的？谢谢～

luotao1 commented 4 months ago

libphi.lib(pooling.cc.obj) : error LNK2019: unresolved external symbol "public: __cdecl pir::InterfaceValue::~InterfaceValue(void)" (??1InterfaceValue@pir@@QEAA@XZ) referenced in function "protected: void __cdecl std::_Tree<class std::_Tset_traits<class pir::InterfaceValue,struct std::less<class pir::InterfaceValue>,class std::allocator<class pir::InterfaceValue>,0> >::_Erase(struct std::_Tree_node<class pir::InterfaceValue,void *> *)" (?_Erase@?$_Tree@V?$_Tset_traits@VInterfaceValue@pir@@U?$less@VInterfaceValue@pir@@@std@@V?$allocator@VInterfaceValue@pir@@@4@$0A@@std@@@std@@IEAAXPEAU?$_Tree_node@VInterfaceValue@pir@@PEAX@2@@Z)

它的pooling.cc.ob编译有问题，我理解phi和pir是独立的，现在phi找不到pir的符号，因该是它引入了bug
windows的机制是不主动暴露符号，生成的库的符号对外部是不可见的。了解到最近pir同学可能将pir和phi解耦了，导致了这些符号在windwos不可见了

comment from @xuxinyi389

megemini commented 4 months ago

libphi.lib(pooling.cc.obj) : error LNK2019: unresolved external symbol "public: __cdecl pir::InterfaceValue::~InterfaceValue(void)" (??1InterfaceValue@pir@@QEAA@XZ) referenced in function "protected: void __cdecl std::_Tree<class std::_Tset_traits<class pir::InterfaceValue,struct std::less<class pir::InterfaceValue>,class std::allocator<class pir::InterfaceValue>,0> >::_Erase(struct std::_Tree_node<class pir::InterfaceValue,void *> *)" (?_Erase@?$_Tree@V?$_Tset_traits@VInterfaceValue@pir@@U?$less@VInterfaceValue@pir@@@std@@V?$allocator@VInterfaceValue@pir@@@4@$0A@@std@@@std@@IEAAXPEAU?$_Tree_node@VInterfaceValue@pir@@PEAX@2@@Z)
它的pooling.cc.ob编译有问题，我理解phi和pir是独立的，现在phi找不到pir的符号，因该是它引入了bug

windows的机制是不主动暴露符号，生成的库的符号对外部是不可见的。了解到最近pir同学可能将pir和phi解耦了，导致了这些符号在windwos不可见了

comment from @xuxinyi389

@luotao1 @xuxinyi389 感谢二位帮忙定位问题～ 👍👍👍

刚合入了一下代码，目前看，PR-CI-Windows-OPENBLAS 虽然失败了，但是我这边的算子应该是通过了：

其他两个 windows 的 ci 应该是依赖这个 windows-openblas 的吧？这要怎么处理？

谢谢！

xuxinyi389 commented 4 months ago

另外两条看起来没有什么问题，你可以先解决openblas流水线的问题

megemini commented 4 months ago

另外两条看起来没有什么问题，你可以先解决openblas流水线的问题

非常感谢！：）

megemini commented 4 months ago

看了一下没什么问题了，static_check流水线需要关注一下，现在新增api名字跟参数要跟yaml保持一致了。
3. API's name and params should be consistent with op's name and params in yaml.
2024-01-09 11:03:51                 The API or Yaml file you changed may cause inconsistent.
2024-01-09 11:03:51  please request one of the RD (YuanRisheng, zyfncg, chenwhql, phlrain) 

哦？那要怎么改？

def fractional_max_pool2d(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

改为

def fractional_max_pool2d_with_index(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

还是修改 yaml，把

- op : fractional_max_pool2d_with_index
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

改为

- op : fractional_max_pool2d
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

另外，现在只有 name 是算子里面没有的参数，这个要怎么搞？

谢谢！：）

Charles-hit commented 4 months ago

看了一下没什么问题了，static_check流水线需要关注一下，现在新增api名字跟参数要跟yaml保持一致了。
3. API's name and params should be consistent with op's name and params in yaml.
2024-01-09 11:03:51                 The API or Yaml file you changed may cause inconsistent.
2024-01-09 11:03:51  please request one of the RD (YuanRisheng, zyfncg, chenwhql, phlrain) 

哦？那要怎么改？

def fractional_max_pool2d(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

改为

def fractional_max_pool2d_with_index(
    x,
    output_size,
    kernel_size=None,
    random_u=None,
    return_mask=False,
    name=None,
):

还是修改 yaml，把

- op : fractional_max_pool2d_with_index
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

改为

- op : fractional_max_pool2d
  args : (Tensor x, int[] output_size, int[] kernel_size = {0, 0}, float random_u = 0.0, bool return_mask = true)
  output : Tensor(out), Tensor(mask)
  infer_meta :
    func : FractionalMaxPoolWithIndexInferMeta
  kernel :
    func : fractional_max_pool2d_with_index
  backward : fractional_max_pool2d_with_index_grad

另外，现在只有 name 是算子里面没有的参数，这个要怎么搞？

谢谢！：）

修改一下yaml op的名字应该就可以了 name不需要关注

megemini commented 4 months ago

Update 20240110

修改算子名称

@Charles-hit 请评审～

megemini commented 4 months ago

Update 20240111

移除 op_compat.yaml 中的修改
修改算子与函数名，不使用 index/idx 等字样
修改函数按字母排序

@Charles-hit @zyfncg 请评审～

jeff41404 commented 4 months ago

code is fine, but the design of API in rfc should be modified to be consistent with the code.

megemini commented 4 months ago

code is fine, but the design of API in rfc should be modified to be consistent with the code.

https://github.com/PaddlePaddle/community/pull/798