PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.29k stars 5.62k forks source link

High order differentiability of trace operator: The Op trace_grad doesn't have any gradop. #68556

Open Adversarr opened 1 month ago

Adversarr commented 1 month ago

bug描述 Describe the Bug

Environment: paddlepaddle, 3.0.0b1, py312_mac

Runtime Error thrown when executing following code. The trace operator does not support high order diff, but trace operator's second order diff is trivial.

import paddle
from paddle.autograd import jacobian

B, D = 10, 5

model = paddle.nn.Sequential(
    paddle.nn.Linear(D, 16),
    paddle.nn.Tanh(),
    paddle.nn.Linear(16, D),
)
x = paddle.randn([B, D])
x.stop_gradient = False

def jac(u, x):
    return jacobian(u, x, batch_axis=0)[::]

def div(u, x):
    return paddle.trace(jac(u, x), offset=0, axis1=-1, axis2=-2)

u = model(x)
print(div(jac(div(u, x), x), x))

Error msg:


Traceback (most recent call last):
  File "bug.py", line 21, in <module>
    print(div(jac(div(u, x), x), x))
              ^^^^^^^^^^^^^^^^^
  File "bug.py", line 15, in jac
    return jacobian(u, x, batch_axis=0)[::]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/autograd/autograd.py", line 98, in __getitem__
    return self._jacobian[indexes]
           ~~~~~~~~~~~~~~^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/autograd/autograd.py", line 295, in __getitem__
    [self._cached_evaluate(i) for i in lazy_indexes],
     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/autograd/autograd.py", line 312, in _cached_evaluate
    v = self._evaluate(k)
        ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/autograd/autograd.py", line 398, in _evaluate
    _grad_for_jacobian(self._flatten_ys[:, row_index], self._xs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/autograd/autograd.py", line 702, in _grad_for_jacobian
    xs_grad = paddle.grad(ys, xs, v, create_graph=True, allow_unused=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/base/wrapped_decorator.py", line 26, in __impl__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/base/framework.py", line 661, in __impl__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/paddle/lib/python3.12/site-packages/paddle/base/dygraph/base.py", line 815, in grad
    return core.eager.run_partial_grad(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: (Unavailable) The Op trace_grad doesn't have any gradop. If you don't intend calculating higher orderderivatives, please set `create_graph`to False. (at /Users/paddle/xly/workspace/1622fd51-38d0-4d14-8168-9796039928d6/Paddle/paddle/fluid/eager/api/generated/eager_generated/backwards/nodes.cc:40806)

### 其他补充信息 Additional Supplementary Information

_No response_
Adversarr commented 1 month ago

Many other simple(even linear) operators also do not support high order derivatives, such as reshape. Even sin/cos do not have high order derivatives(≥4). PyTorch support all kinds of this.

lxd-cumt commented 1 month ago

trace_grad 一阶微分未实现反向算子,即 trace_double_grad,故而不支持高阶微分。 如果需要支持高阶微分,有如下两种方案: 1)实现 trace_double_grad, trace_triple_grad 等更高阶算子,代价高,且只能支持有限阶微分; 2)基于组合算子机制,实现 trace_grad 的组合逻辑,即使用基础算子集合组合实现 trace_grad,对 trace_grad 求二阶甚至更高阶的时候,只使用基础算子集合,理论支持无限阶。

lxd-cumt commented 1 month ago

我看一下框架组合算子机制的同学是谁在支持吧

lxd-cumt commented 1 month ago

Many other simple(even linear) operators also do not support high order derivatives, such as reshape. Even sin/cos do not have high order derivatives(≥4). PyTorch support all kinds of this.

这种也是需要组合算子支持,否则一直设计高阶算子,代价太高了