RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Apache License 2.0
544
stars
50
forks
source link
您好,I'd like to ask a question that might not be very professional. In the code, the weights are loaded through Python. Where are they passed to the C++(fasttransformer) part? #86
https://github.com/alibaba/rtp-llm/blob/04fe4dafe5d204d14ec41f1b2ab0212398751d4b/maga_transformer/ops/rtp_llm/rtp_llm_op.py#L21