PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.16k stars 5.57k forks source link

Memory Leak Bugs Reported by PyRefCon(static analyzer report) #58025

Open nonsense-j opened 1 year ago

nonsense-j commented 1 year ago

Bugs Overview

PyRefCon(by @Snape3058) is a static analyzer tool to detect memory-related bugs (refcounting issues) by tracing reference count of python objects. After analyzing Paddle 2.5(latest) by PyRefCon, we found 36 bugs, all of which are categorized as memory leak issues. After mannually reviewed based on our knowledge, we divided them into these groups by their leak location:

All reports can be accessed in our reports archive. In detail, we will introduce every bug report organized by their located files for easy revision.

Bugs in paddle/fluid/eager/pylayer/py_layer_node.cc

  1. Report Page: report-c082e3. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to backward_args: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L59 Goto Paddle_THROW. backward_args goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L107-L108

  2. Report Page: report-4e5037. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to backward_fn: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L104-L105 Goto Paddle_THROW. backward_fn goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L115-L116

  3. Report Page: report-2d9225. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to outputs: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L112 Goto Paddle_THROW. outputs goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L115-L116

  4. Report Page: report-6a85d9. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to outputs_tuple: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L125 Goto Paddle_THROW. outputs_tuple goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/eager/pylayer/py_layer_node.cc#L133-L137

Bugs in paddle/fluid/pybind/eager_py_layer.cc

  1. Report Page: report-e2b657. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to backward_function: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L134-L135 Goto Paddle_THROW. backward_function goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L143-L144

  2. Report Page: report-d7fc4d. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to kwargs_value_list: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L158 Goto Paddle_THROW. kwargs_value_list goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L255-L256

  3. Report Page: report-08a05f. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to forward_args: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L164 Goto Paddle_THROW. forward_args goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L255-L256

  4. Report Page: report-cc56de. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to forward_fn: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L253 Goto Paddle_THROW. forward_fn goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L376-L377

  5. Report Page: report-acf21f. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to outputs: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L260 Goto Paddle_THROW. outputs goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L376-L377

  6. Report Page: report-7f1356. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to outputs_tuple: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L274 Goto Paddle_THROW. outputs_tuple goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L376-L377

  7. Report Page: report-a2d9e2. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to outputs_tuple: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L276 Goto Paddle_THROW. outputs_tuple goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L376-L377

  8. Report Page: report-7ea1ff. Bug Type: Pure Leak. A new reference is returned and assigned to saved_value: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L525 As the report path demonstrates, saved_value will go out-of-scope without decreasing the refcnt.

  9. Report Page: report-1bb618. Bug Type: Pure Leak. A new reference is returned and assigned to saved_value: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L527 As the report path demonstrates, saved_value will go out-of-scope without decreasing the refcnt.

  10. Report Page: report-4b54e6. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to packed_value: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L536 Goto Paddle_THROW. packed_value goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L580-L582

  11. Report Page: report-35548a. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to tmp_list: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L547 Goto Paddle_THROW. tmp_list goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L556-L558

  12. Report Page: report-a3d8cb. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to tmp_tuple: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L564 Goto Paddle_THROW. tmp_tuple goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_py_layer.cc#L573-L575

Bugs in paddle/fluid/pybind/eager_utils.cc

  1. Report Page: report-e59e5d. Bug Type: Pure Leak. A new reference is returned and assigned to mod: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L263 As the report path demonstrates, mod will go out-of-scope without decreasing the refcnt.

  2. Report Page: report-ac9dfb. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to value: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L985 Goto Paddle_THROW. value goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L988-L989

  3. Report Page: report-408609. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to result: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L860 Goto Paddle_THROW. result goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L876-L877

  4. Report Page: report-8b3957. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to dict: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L980 Goto Paddle_THROW. dict goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L988-L989

  5. Report Page: report-77d4a3. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to dict: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1002 Goto Paddle_THROW. dict goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1007-L1008

  6. Report Page: report-98837d. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to key_string: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1005 Goto Paddle_THROW. key_string goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1016-L1017

  7. Report Page: report-0199e9. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to py_list: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1012 Goto Paddle_THROW. py_list goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1016-L1017

  8. Report Page: report-429c89. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to dict: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1039 Goto Paddle_THROW. dict goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1045-L1046

  9. Report Page: report-e1eb58. Bug Type: Pure Leak. A new reference is returned and assigned to py_func_: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1870 As the report path demonstrates, py_func_ will go out-of-scope without decreasing the refcnt.

  10. Report Page: report-ee0617. Bug Type: PADDLE_ENFORCE Leak. A new reference is returned and assigned to args: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1920 Goto Paddle_EnFORCE_XX assertion. args goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1923-L1925

  11. Report Page: report-318da6. Bug Type: PADDLE_ENFORCE Leak. A new reference is returned and assigned to args: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1935 Goto Paddle_EnFORCE_XX assertion. args goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1938-L1941

  12. Report Page: report-6e8fb1. Bug Type: PADDLE_ENFORCE Leak. A new reference is returned and assigned to args: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1959 Goto Paddle_EnFORCE_XX assertion. args goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1963-L1965

  13. Report Page: report-801c9b. Bug Type: Pure Leak. A new reference is returned and assigned to reinterpret_cast<PyObject*>(packed_value->get()): https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1960 As the report path demonstrates, reinterpret_cast<PyObject*>(packed_value->get()) will go out-of-scope without decreasing the refcnt.

  14. Report Page: report-37c131. Bug Type: PADDLE_ENFORCE Leak. A new reference is returned and assigned to args: https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1984 Goto Paddle_EnFORCE_XX assertion. args goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/a2a1a787d8a40459094d728ada03a4b55ffc3d0f/paddle/fluid/pybind/eager_utils.cc#L1988-L1990

Bugs in paddle/fluid/pybind/op_function_common.cc

  1. Report Page: report-bcf9a5. Bug Type: Pure Leak. A new reference is returned and assigned to to: https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L81 As the report path demonstrates, to will go out-of-scope without decreasing the refcnt.

  2. Report Page: report-f5f447. Bug Type: Pure Leak. A new reference is returned and assigned to to: https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L100 As the report path demonstrates, to will go out-of-scope without decreasing the refcnt.

  3. Report Page: report-271ab5. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to item: https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L419 Goto Paddle_THROW. item goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L423-L429

  4. Report Page: report-b7acd3. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to item: https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L494 Goto Paddle_THROW. item goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L498-L504

  5. Report Page: report-79f296. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to item: https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L573 Goto Paddle_THROW. item goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L577-L583

  6. Report Page: report-9538ed. Bug Type: PADDLE_THROW Leak. A new reference is returned and assigned to item: https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L648 Goto Paddle_THROW. item goes out-of-scope without decreasing the refcnt https://github.com/PaddlePaddle/Paddle/blob/60adf3a499df394da69a86b018bb13a8a665b9d3/paddle/fluid/pybind/op_function_common.cc#L662-L667

Fix Suggestions

To fix the above issues, we suggest using a smart pointer implementation to replace the raw pointers in code to automatically decrease the reference count after exceptions are thrown. e.g. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/lib/core/safe_pyobject_ptr.h

nonsense-j commented 1 year ago

Submitted by mistake, this issue will be reopened later.

wanghuancoder commented 1 year ago

非常感谢您的建议!回头看,确实智能指针更好管理一些!这是早期我设计时的失误。 目前这个改成只能指针工作量不小,我们将尽早排期优化!

Snape3058 commented 1 year ago

除了 pure leak 是引用計數忘記遞減外,其他都是臨時使用的對象在拋出異常的情況下會導致引用計數未被遞減的情況。 可以根據工具報告了的問題,對項目中的其他各處的使用情況進行進一步檢查。 感謝回覆,還希望能在對上述問題進行核對後給出確認的結果。如果可以的話,希望確認結果能使用英文進行回覆。 其他討論可以繼續使用中文

TimeYWL commented 9 months ago

Is there any workaround method?

Snape3058 commented 9 months ago

Is there any workaround method?

The fix suggestion part, at the end of this issue report, can be a workaround solution.

Using RAII-based monitor objects, such as smart pointers may be helpful to automatically decrease the reference count when the function exits unexpectedly.

TimeYWL commented 9 months ago

When I run my demo, the memory keeps increasing without out any exception:

class dropout(paddle.nn.Layer):
    def __init__(self):
        super().__init__()
        self.scale = 1.0
        self.bias = 0.5
    def forward(self, x):
        class c_dropout_eager(PyLayer):
            @staticmethod
            def forward(ctx, tensor):
                return _legacy_C_ops.scale(tensor, 'scale', self.scale, 'bias', self.bias)
            @staticmethod
            def backward(ctx, dy):
                return _legacy_C_ops.scale(tensor, 'scale', self.scale, 'bias', self.bias)
        return c_dropout_eager.apply(x)

drop_demo = dropout()

for i in tqdm(range(500000)):
    np_data = np.random.rand(512, 4096)

    output = drop_demo(paddle.to_tensor(np_data, dtype='float32'))

How it happend?

Snape3058 commented 9 months ago

When I run my demo, the memory keeps increasing without out any exception:

These reports are generated with our static analyzer. Hence, we do not know how to trigger them with actual Python code.

Besides, the aforementioned bugs happen when an exception is thrown (for bugs not categorized as "pure leak"). If no exceptions are thrown during execution, the corresponding execution path may not follow the one we reported.

For details of the reported path, please click the "report-xxxxxx" link of each report.