cornell-zhang / allo

Allo: A Programming Model for Composable Accelerator Design
https://cornell-zhang.github.io/allo
Apache License 2.0
128 stars 21 forks source link

[Feature] Examples of how the llm pldi24-artifacts are generated #165

Open bibo-msft opened 3 months ago

bibo-msft commented 3 months ago

Is your feature request related to a problem? Please describe. I am trying to retarget the llm artifacts to my own FPGA board. I'd like to regenerate the HLS code to try more aggressive quantization schemes.

Describe the solution you'd like Please add some small examples of advanced optimization techniques that are used in the pldi24-artifact repo.

Additional context For example, the softmax operator requires the same fp32 datatype for both input and output. However, there is a mixed precision HLS implementation with input/output packing in the artifact code here. I searched the Allo repo and could not find a reference of how to generate such code.

chhzh123 commented 3 months ago

Hi @bibo-msft, thanks for raising the issue! The PLDI'24 artifact was not purely generated by Allo. There exists some manual hacks in the kernel, and we are still automating the process.

Currently, we have a script for generating the Transformer kernels. Please check out this page for the instructions. This test case also shows a low-bit packing example of GEMM. You can change the bitwidths in the type parameters to generate different GEMM kernels.

We will provide additional examples of mixed precision kernels soon and will notify you once they are available. Please feel free to share any other suggestions you may have. Thank you!