PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
21.78k stars 5.47k forks source link

LSTM API could_use_cudnn 段错误 #64989

Open DrRyanHuang opened 3 weeks ago

DrRyanHuang commented 3 weeks ago

非常感谢Paddle社区为开源做出的贡献!


环境

Python3.9 Paddle版本 2.6.0 cuda_12.2.r12.2/compiler.32965470_0

最小复现代码

以下是我写的 torch LSTM => Paddle LSTM 转换的函数脚本

def LSTMWithProjection_torch2paddle(lstm_paddle, lstm_torch):

    # pd_model_state_dict = lstm_paddle.state_dict()
    pd_model_state_dict = {}
    tc_model_state_dict = lstm_torch.state_dict()

    # print(
    #     pd_model_state_dict['lstm.weight_ih_l0'] is pd_model_state_dict['lstm.0.cell.weight_ih'],
    #     pd_model_state_dict['lstm.weight_hh_l0'] is pd_model_state_dict['lstm.0.cell.weight_hh'],
    #     pd_model_state_dict['lstm.bias_ih_l0'] is pd_model_state_dict['lstm.0.cell.bias_ih'],
    #     pd_model_state_dict['lstm.bias_hh_l0'] is pd_model_state_dict['lstm.0.cell.bias_hh']
    # )

    pd_model_state_dict['lstm.weight_ih_l0'] = paddle.to_tensor(
        tc_model_state_dict['lstm.weight_ih_l0'].detach().cpu().numpy()
    )
    pd_model_state_dict['lstm.weight_hh_l0'] = paddle.to_tensor(
        tc_model_state_dict['lstm.weight_hh_l0'].detach().cpu().numpy()
    )
    pd_model_state_dict['lstm.bias_ih_l0'] = paddle.to_tensor(
        tc_model_state_dict['lstm.bias_ih_l0'].detach().cpu().numpy()
    )
    pd_model_state_dict['lstm.bias_hh_l0'] = paddle.to_tensor(
        tc_model_state_dict['lstm.bias_hh_l0'].detach().cpu().numpy()
    )

    # # -------------------------------------------
    pd_model_state_dict['lstm.0.cell.weight_ih'] = paddle.to_tensor(
        tc_model_state_dict['lstm.weight_ih_l0'].detach().cpu().numpy()
    )
    pd_model_state_dict['lstm.0.cell.weight_hh'] = paddle.to_tensor(
        tc_model_state_dict['lstm.weight_hh_l0'].detach().cpu().numpy()
    )
    pd_model_state_dict['lstm.0.cell.bias_ih'] = paddle.to_tensor(
        tc_model_state_dict['lstm.bias_ih_l0'].detach().cpu().numpy()
    )
    pd_model_state_dict['lstm.0.cell.bias_hh'] = paddle.to_tensor(
        tc_model_state_dict['lstm.bias_hh_l0'].detach().cpu().numpy()
    )

    lstm_paddle.load_dict(pd_model_state_dict)

    lstm_paddle.linear.weight.set_value(
        paddle.to_tensor( lstm_torch.linear.weight.data.cpu().numpy().T )
    )

    return lstm_paddle 

这是测试代码,可以直接运行


class LSTMWithProjection_torch(torch.nn.Module):
    def __init__(self, input_size, hidden_size, proj_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.proj_size = proj_size
        self.lstm = torch.nn.LSTM(input_size, hidden_size, batch_first=True)
        self.linear = torch.nn.Linear(hidden_size, proj_size, bias=False)

    def forward(self, x):
        # self.lstm.flatten_parameters()
        o, (_, _) = self.lstm(x)
        return self.linear(o)

class LSTMWithProjection(paddle.nn.Layer):
    def __init__(self, input_size, hidden_size, proj_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.proj_size = proj_size
        self.lstm = paddle.nn.LSTM(input_size, hidden_size) # batch_first=True
        self.linear = paddle.nn.Linear(hidden_size, proj_size, bias_attr=False)

    def forward(self, x):
        # self.lstm.flatten_parameters()
        o, (_, _) = self.lstm(x)
        return self.linear(o)

if __name__ == "__main__":

    # ---------- 测试结果 ----------
    input_size, hidden_size, proj_size = 80, 768, 256

    # lstm 模型
    lstm_paddle = LSTMWithProjection(input_size, hidden_size, proj_size)
    lstm_torch  = LSTMWithProjection_torch(input_size, hidden_size, proj_size).cuda()

    # lstm 参数传递
    lstm_paddle = LSTMWithProjection_torch2paddle(lstm_paddle, lstm_torch)

    # 输入参数
    x = np.random.rand(10, 250, 80).astype("float32")
    x_tc = torch.from_numpy(x).cuda()
    x_pd = paddle.to_tensor(x)

    lstm_paddle.lstm.could_use_cudnn = False

    y_pd, (_, _) = lstm_paddle.lstm(x_pd)
    y_tc, (_, _) = lstm_torch.lstm(x_tc)

    y_pd = y_pd.numpy()
    y_tc = y_tc.detach().cpu().numpy()

    print(
        abs(
            y_pd - y_tc
        ).max()
    )

    y_pd = lstm_paddle(x_pd)
    y_tc = lstm_torch(x_tc)

    y_pd = y_pd.numpy()
    y_tc = y_tc.detach().cpu().numpy()

    print(
        abs(
            y_pd - y_tc
        ).max(),
        f"mean: {y_pd.mean() - y_tc.mean()}",
        f"std : {y_pd.std() - y_tc.std()}",
    )

如果不手动关闭 cudnn 运行,则会报错

    lstm_paddle.lstm.could_use_cudnn = False

报错内容:

W0607 15:11:35.219653 1625662 rnn_kernel.cu.cc:234] If the memory space of the Input WeightList is not continuous, less efficient calculation will be called. Please call flatten_parameters() to make the input memory continuous.

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1717744295 (unix time) try "date -d @1717744295" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 1625662 (TID 0x7f1aa1cb7740) from PID 0 ***]

我有俩个问题:

    print(
        pd_model_state_dict['lstm.weight_ih_l0'] is pd_model_state_dict['lstm.0.cell.weight_ih'],
        pd_model_state_dict['lstm.weight_hh_l0'] is pd_model_state_dict['lstm.0.cell.weight_hh'],
        pd_model_state_dict['lstm.bias_ih_l0'] is pd_model_state_dict['lstm.0.cell.bias_ih'],
        pd_model_state_dict['lstm.bias_hh_l0'] is pd_model_state_dict['lstm.0.cell.bias_hh']
    )

其他补充信息 Additional Supplementary Information

No response

lizexu123 commented 3 weeks ago

torch版本是多少,我先本地复现一下

DrRyanHuang commented 3 weeks ago

@lizexu123

>>> torch.__version__
'1.13.0+cu117-with-pypi-cudnn'
lizexu123 commented 3 weeks ago

你上面的cuda版本不是12.2吗

DrRyanHuang commented 3 weeks ago

是CUDA12,但我当时装的就是1.13.0+cu117 😂 感觉和torch关系不大,主要是paddle的参数读取内存不连续问题?

lizexu123 commented 3 weeks ago

我对齐一下torch和cuda版本试试