Add support for Layer Normalization

I've got a branch adding support for Layer Normalization using either Keras or PyTorch with the Vivado backend in io_parallel mode, and I'd like to submit a pull request.

The implementation uses a lookup table for inverse square root; the inputs to the lookup table follow a logarithmic distribution for better accuracy. Tests have been added for both Keras and Pytorch parsing.

Credit is due to @Ethan0Jiang and @LostEcho365 (Zhixing Jiang and Dennis Yin) for their Vivado implementation and Keras parsing support; my contributions were making a change to the inverse square root lookup table implementation, implementing PyTorch parsing, and adding unit tests. (Here's a link to their pre-print.) The original code authors have given permission for their code to be merged into hls4ml.

While I haven't run this on an actual board, below I have some latency / resource usage estimations from Vitis HLS 2023.2.

keras_layernorm_report.txt pytorch_layernorm_report.txt

I believe that transformer architecture is a widely requested feature for hls4ml, and Layer Normalization is a key step in that direction.

fastmachinelearning / hls4ml

Add support for Layer Normalization #1109