Closed julianhoever closed 1 year ago
Just as a note... Combining a linear operation with a batch norm mathematically works as follows:
$lin(x) = Ax+b$ $bn(x) = \frac{x-\mu}{\sqrt{\sigma^2+\epsilon}} \cdot \gamma + \beta$ $bn(lin(x)) = \frac{Ax+b-\mu}{\sqrt{\sigma^2+\epsilon}} \cdot \gamma + \beta = \frac{\gamma A}{\sqrt{\sigma^2+\epsilon}}x + \frac{\gamma (b-\mu)}{\sqrt{\sigma^2+\epsilon}} + \beta$ This results in $lin'(x) = A'x + b'$ with $A' = \frac{\gamma A}{\sqrt{\sigma^2+\epsilon}}$ and $b' = \frac{\gamma (b-\mu)}{\sqrt{\sigma^2+\epsilon}} + \beta$
If the sigma and epsilon need to be calculated on FPGA with fixed point data, this could be challenging. and the square root.
No. As far as I understand it, our plan is to only collect sigma and epsilon only during training, and when translating the model to VHDL, we just calculate new weights according to my (hopefully correct) formulars. So no calculation for sigma and epsilon on the FPGA side needed.
GreatSent from my iPhoneOn Apr 28, 2023, at 17:08, Julian Hoever @.***> wrote: No. As far as I understand it, our plan is to only collect sigma and epsilon only during training, and when translating the model to VHDL, we just calculate new weights according to my (hopefully correct) formulars. So no calculation for sigma and epsilon on the FPGA side needed.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
formula looks correct ;)
Implement a combination of a linear layer followed by a batch norm layer that translates to a single linear layer where the batch norm operation is integrated in the linear calculation.