NVIDIA / cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
MIT License
450 stars 90 forks source link

[Question] How to perform backpropagation for a conv + sigmoid layer? #113

Open zhewenhu opened 1 month ago

zhewenhu commented 1 month ago

Hi,

I have implemented the forward pass using a convolution + sigmoid_fwd activation and am now working on the backpropagation of the graph. However, according to the document, a graph of sigmoid_bwd + dgrad/wgrad is not supported. I also tried to build this graph but got the error: No valid engine configs for SIGMOID_BWD_ConvBwdData_. Does cuDNN offer any alternatives or methods for implementing this backpropagation?

Here is my code for fprop:

graph_fwd = std::make_shared<fe::graph::Graph>();
graph_fwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

X = graph_fwd->tensor(fe::graph::Tensor_attributes()
                    .set_name("input")
                    .set_dim({n, c, h, w})
                    .set_stride({c * h * w, 1, c * w, c}));

W = graph_fwd->tensor(fe::graph::Tensor_attributes()
                    .set_name("weight")
                    .set_dim({k, c, r, s})
                    .set_stride({c * r * s, 1, c * s, c}));

auto conv_options =
    fe::graph::Conv_fprop_attributes().set_padding({0, 0}).set_stride({1, 1}).set_dilation({1, 1});
conv_output = graph_fwd->conv_fprop(X, W, conv_options);

auto sigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_FWD);
Y = graph_fwd->pointwise(conv_output, sigmoid_options);

conv_output->set_output(true);
Y->set_output(true);

And the code for dgrad I attempted but got error No valid engine configs for SIGMOID_BWD_ConvBwdData_:

graph_d_bwd = std::make_shared<fe::graph::Graph>();
graph_d_bwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

dY = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("grad")
                        .set_dim({n, k, h, w})
                        .set_stride({k * h * w, 1, k * w, k}));

W_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("weight")
                        .set_dim(W->get_dim())
                        .set_stride(W->get_stride()));

conv_output_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("conv_output")
                        .set_dim(conv_output->get_dim())
                        .set_stride(conv_output->get_stride()));

auto dsigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_BWD);
auto dsigmoid_output = graph_d_bwd->pointwise(dY, conv_output_bwd, dsigmoid_options);
dsigmoid_output->set_dim({n, k, h, w});

auto dgrad_options = fe::graph::Conv_dgrad_attributes().set_padding({0, 0}).set_stride({1, 1}).set_dilation({1, 1});
dX = graph_d_bwd->conv_dgrad(dsigmoid_output, W_bwd, dgrad_options);
dX->set_dim({n, c, h, w}).set_output(true);
Anerudhan commented 1 month ago

Hi @zhewenhu ,

Thanks for posting this. Unfortunately, cudnn does not support the backward graph pattern.

Instead, the suggestion is to split it into two graphs. One that does dSigmoid and other that does dgrad.

Let us know if you have specific use case in mind.

Thanks

zhewenhu commented 1 month ago

Hi @Anerudhan ,

I also tried splitting them, but Sigmoid alone is also not supported, and I got the same error: No valid engine configs for SIGMOID_BWD_. Could you check if I did something wrong?

Here is the code:

graph_d_bwd = std::make_shared<fe::graph::Graph>();
graph_d_bwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

dY = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("grad")
                        .set_dim({n, k, h, w})
                        .set_stride({k * h * w, 1, k * w, k}));

conv_output_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("conv_output")
                        .set_dim(conv_output->get_dim())
                        .set_stride(conv_output->get_stride()));

auto dsigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_BWD);
auto dsigmoid_output = graph_d_bwd->pointwise(dY, conv_output_bwd, dsigmoid_options);
dsigmoid_output->set_dim({n, k, h, w}).set_output(true);
Anerudhan commented 1 month ago

Hi @zhewenhu ,

I just took a look at this on H100. And this code seems to be passing. Do you know which GPU you are running this on?

Thanks