[Feature] Examples of how the llm pldi24-artifacts are generated

cornell-zhang / allo

Allo: A Programming Model for Composable Accelerator Design

Apache License 2.0

128 stars 21 forks source link

Is your feature request related to a problem? Please describe. I am trying to retarget the llm artifacts to my own FPGA board. I'd like to regenerate the HLS code to try more aggressive quantization schemes.

Describe the solution you'd like Please add some small examples of advanced optimization techniques that are used in the pldi24-artifact repo.

Mixed precision input/output for GEMM
Mixed precision activation/weight for GEMM
Mixed precision input/output for Softmax/Layernorm/Residual
Low-bit packing input/output for GEMM/Softmax

Additional context For example, the softmax operator requires the same fp32 datatype for both input and output. However, there is a mixed precision HLS implementation with input/output packing in the artifact code here. I searched the Allo repo and could not find a reference of how to generate such code.

cornell-zhang / allo

[Feature] Examples of how the llm pldi24-artifacts are generated #165