alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Apache License 2.0
544 stars 50 forks source link

support to run example/test.py and integrate optimized gemm/attention operator #113

Closed TianyuLi0 closed 2 months ago

TianyuLi0 commented 2 months ago

Support to run example/test.py on Arm Integrate optimized gemm kernel from Xujie Integrate optimized attention operator from Ruifeng Integrate sampleGreedy from Haijiang Fix rebase/integration issues

Note: some workarounds applied to pass build, Turn off warning as error, Workspace/bazel build change requirements txt chagne, May need to revert before merge.

To run the example: bazel test //example:test --config=arm --test_output=all

CLAassistant commented 2 months ago

CLA assistant check
All committers have signed the CLA.