I am very impressed with how you enforce constraints with Lagrange multipliers.
In the paper, I notice that affine layers are encoded with z(i) = W(i)z(i-1)+b(i), which only captures fully-connected/convolutional/... layers's behavior.
But for an Add layer in residual networks in ONNX model, its function is like z(i) = z(i-1)+z(i-k). I fail to see how your process extends to residual networks, but I did observe residual networks in your experiments.
So I wonder if there is a theorem behind handling the residual networks? And is this theorem (if any) just a customization of your existing version?
Thank you in advance for your clarification!
Dear,
I am very impressed with how you enforce constraints with Lagrange multipliers. In the paper, I notice that affine layers are encoded with z(i) = W(i)z(i-1)+b(i), which only captures fully-connected/convolutional/... layers's behavior. But for an Add layer in residual networks in ONNX model, its function is like z(i) = z(i-1)+z(i-k). I fail to see how your process extends to residual networks, but I did observe residual networks in your experiments. So I wonder if there is a theorem behind handling the residual networks? And is this theorem (if any) just a customization of your existing version? Thank you in advance for your clarification!