Query of theorem of handling residual networks with ADD layer

Dear,

I am very impressed with how you enforce constraints with Lagrange multipliers. In the paper, I notice that affine layers are encoded with z⁽ⁱ⁾ = W⁽ⁱ⁾z^(i-1)+b⁽ⁱ⁾, which only captures fully-connected/convolutional/... layers's behavior. But for an Add layer in residual networks in ONNX model, its function is like z⁽ⁱ⁾ = z^(i-1)+z^(i-k). I fail to see how your process extends to residual networks, but I did observe residual networks in your experiments. So I wonder if there is a theorem behind handling the residual networks? And is this theorem (if any) just a customization of your existing version? Thank you in advance for your clarification!

eth-sri / mn-bab

Query of theorem of handling residual networks with ADD layer #8