es-ude / elastic-ai.creator

elastic ai.creator
MIT License
16 stars 2 forks source link

Implement linear layer with a batch norm layer afterwards #240

Closed julianhoever closed 1 year ago

julianhoever commented 1 year ago

Implement a combination of a linear layer followed by a batch norm layer that translates to a single linear layer where the batch norm operation is integrated in the linear calculation.

julianhoever commented 1 year ago

Just as a note... Combining a linear operation with a batch norm mathematically works as follows:

$lin(x) = Ax+b$ $bn(x) = \frac{x-\mu}{\sqrt{\sigma^2+\epsilon}} \cdot \gamma + \beta$ $bn(lin(x)) = \frac{Ax+b-\mu}{\sqrt{\sigma^2+\epsilon}} \cdot \gamma + \beta = \frac{\gamma A}{\sqrt{\sigma^2+\epsilon}}x + \frac{\gamma (b-\mu)}{\sqrt{\sigma^2+\epsilon}} + \beta$ This results in $lin'(x) = A'x + b'$ with $A' = \frac{\gamma A}{\sqrt{\sigma^2+\epsilon}}$ and $b' = \frac{\gamma (b-\mu)}{\sqrt{\sigma^2+\epsilon}} + \beta$

SuperChange001 commented 1 year ago

If the sigma and epsilon need to be calculated on FPGA with fixed point data, this could be challenging. and the square root.

julianhoever commented 1 year ago

No. As far as I understand it, our plan is to only collect sigma and epsilon only during training, and when translating the model to VHDL, we just calculate new weights according to my (hopefully correct) formulars. So no calculation for sigma and epsilon on the FPGA side needed.

SuperChange001 commented 1 year ago

GreatSent from my iPhoneOn Apr 28, 2023, at 17:08, Julian Hoever @.***> wrote: No. As far as I understand it, our plan is to only collect sigma and epsilon only during training, and when translating the model to VHDL, we just calculate new weights according to my (hopefully correct) formulars. So no calculation for sigma and epsilon on the FPGA side needed.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

glencoe commented 1 year ago

formula looks correct ;)