I saw your pseudo-code on backward function of GDL. However it only return the input's gradient (backbone gradient) for learning in the backbone, meanwhile, the gradient of parameter layer A is None, so how the autograd can update the parameters of channel-wise weights without gradient?
I saw your pseudo-code on backward function of GDL. However it only return the input's gradient (backbone gradient) for learning in the backbone, meanwhile, the gradient of parameter layer A is None, so how the autograd can update the parameters of channel-wise weights without gradient?