issues
search
alibaba
/
rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Apache License 2.0
544
stars
50
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
最新代码 bazel编译出错
#126
samaritan1998
opened
5 days ago
5
请问有支持qwen2vl的计划么 ?
#125
samaritan1998
closed
3 days ago
2
support ebedding和reranker 部署时量化8
#124
invisifire
opened
1 week ago
2
0.1.13 支持 llama3吗 不支持的话 0.2.0的镜像啥时候发呢
#123
Xerxes-cn
closed
1 week ago
0
head_num和tp_size不匹配
#122
yato1997
closed
2 weeks ago
0
fix: multiple request with top_k = 0 out of vocab
#121
TianyuLi0
closed
3 weeks ago
1
2张16G的T4卡都跑不起来examples/test.py
#120
zhangtaibo
opened
1 month ago
1
[cpu] add sampleGreedy implementation
#119
wenhuanh
closed
1 month ago
0
fix: open source build and deps on Arm
#118
TianyuLi0
closed
3 weeks ago
1
perf: optimization of attention, softmax, layernorm
#117
Reyfone
closed
1 month ago
0
Add grouped query attention support
#116
Reyfone
closed
2 months ago
0
[Doc] 多卡并行文档修改建议
#115
linnlh
opened
2 months ago
0
RTP-LLM 模式下,llama3.1 FP16 效果不一样
#114
anigi98932
opened
2 months ago
1
support to run example/test.py and integrate optimized gemm/attention operator
#113
TianyuLi0
closed
2 months ago
1
support to run example/test.py on Arm
#112
TianyuLi0
closed
2 months ago
0
双卡A6000推理,模型推理结束,一张卡GPU利用率为0,一张卡GPU利用率100%
#111
zf761
closed
3 weeks ago
1
无法运行tests目录下的Python测试脚本,缺少libtest_ops.so
#110
leepoly
opened
2 months ago
1
双卡A6000推理,模型推理结束,一张卡GPU利用率为0,一张卡GPU利用率100%
#109
zf761
opened
2 months ago
1
fix: unit test and cpp model test
#108
Reyfone
closed
2 months ago
0
Enable MHA parallel on Arm
#107
Reyfone
closed
2 months ago
0
attention: add MHA parallel support
#106
Reyfone
closed
2 months ago
1
speculate sampling用medusa加载medusa官方模型报错
#105
wcsjtu
opened
2 months ago
6
reranker token长度拦截异常
#104
invisifire
closed
2 months ago
2
add opt_125M
#103
Nanuion
opened
3 months ago
2
新增OPT模型,模型输出不符合预期
#102
samaritan1998
closed
3 months ago
0
[CPU] add implementation for GEMM and token embedding
#101
wenhuanh
closed
2 months ago
0
推理出现乱码(показать показать показать показать показать)(USE_NEW_DEVICE_IMPL=1)
#100
w066650
opened
3 months ago
1
[ROCm] refine quantization related code
#99
feifei14119
closed
3 months ago
2
[ROCm] MoE version1
#98
feifei14119
closed
3 months ago
1
[ROCm] Support Int4 and bf16 for rocm version
#97
feifei14119
closed
3 months ago
0
[ROCm] add quant op and port rccl
#96
feifei14119
closed
3 months ago
0
新增OPT模型后跑不通,报CUDA错误
#95
samaritan1998
closed
3 months ago
5
[ROCm] Includes docker container creation script file for rocm build
#94
feifei14119
closed
3 months ago
1
[ROCm] Fix ROCm sampler OP test
#93
feifei14119
closed
3 months ago
0
[cpu-impl] Add for layernorm and rmsnorm
#92
wenhuanh
closed
3 months ago
0
HELP: No matching distribution found for torch==2.1.0+cu121 error while install maga_transformer with .whl in release 0.2.0
#91
HuXinjing
opened
4 months ago
7
fix: adapt to index based kv cache for Arm device
#90
Reyfone
closed
4 months ago
1
`Illegal instruction` error when running version 0.2.0
#89
frankang
closed
3 months ago
2
bazel build error
#88
frankang
closed
4 months ago
2
[ROCm] Port basic gpt model to rocm. qwen2 end-to-end test pass
#87
feifei14119
closed
3 months ago
5
您好,I'd like to ask a question that might not be very professional. In the code, the weights are loaded through Python. Where are they passed to the C++(fasttransformer) part?
#86
samaritan1998
closed
4 months ago
1
[DRAFT] not ready, please do NOT review
#85
feifei14119
closed
4 months ago
1
support DeepSeek-V2-Lite-Chat
#84
jianglan89
opened
4 months ago
1
feat: add arm cpu device support
#83
TianyuLi0
closed
4 months ago
1
多机单卡/多卡,报错 gang_info self None
#82
MasterJanus
closed
4 months ago
0
[ROCm] Init rocm_impl device and add test op
#81
feifei14119
closed
4 months ago
4
feat: add cpu attention api
#80
wenhuanh
closed
4 months ago
0
[ROCm] Initial enablement
#79
draganmladjenovic
closed
4 months ago
6
git clone Error
#78
hz0ne
closed
3 months ago
3
fix(src): fix bazel build special type cast and template match for cuda118
#77
khan-yin
closed
4 months ago
14
Next