alibaba rtp-llm issues - Githubissues

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Apache License 2.0

544 stars 50 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

最新代码 bazel编译出错

#126 samaritan1998 opened 5 days ago
5
请问有支持qwen2vl的计划么？

#125 samaritan1998 closed 3 days ago
2
support ebedding和reranker 部署时量化8

#124 invisifire opened 1 week ago
2
0.1.13 支持 llama3吗不支持的话 0.2.0的镜像啥时候发呢

#123 Xerxes-cn closed 1 week ago
0
head_num和tp_size不匹配

#122 yato1997 closed 2 weeks ago
0
fix: multiple request with top_k = 0 out of vocab

#121 TianyuLi0 closed 3 weeks ago
1
2张16G的T4卡都跑不起来examples/test.py

#120 zhangtaibo opened 1 month ago
1
[cpu] add sampleGreedy implementation

#119 wenhuanh closed 1 month ago
0
fix: open source build and deps on Arm

#118 TianyuLi0 closed 3 weeks ago
1
perf: optimization of attention, softmax, layernorm

#117 Reyfone closed 1 month ago
0
Add grouped query attention support

#116 Reyfone closed 2 months ago
0
[Doc] 多卡并行文档修改建议

#115 linnlh opened 2 months ago
0
RTP-LLM 模式下，llama3.1 FP16 效果不一样

#114 anigi98932 opened 2 months ago
1
support to run example/test.py and integrate optimized gemm/attention operator

#113 TianyuLi0 closed 2 months ago
1
support to run example/test.py on Arm

#112 TianyuLi0 closed 2 months ago
0
双卡A6000推理，模型推理结束，一张卡GPU利用率为0，一张卡GPU利用率100%

#111 zf761 closed 3 weeks ago
1
无法运行tests目录下的Python测试脚本，缺少libtest_ops.so

#110 leepoly opened 2 months ago
1
双卡A6000推理，模型推理结束，一张卡GPU利用率为0，一张卡GPU利用率100%

#109 zf761 opened 2 months ago
1
fix: unit test and cpp model test

#108 Reyfone closed 2 months ago
0
Enable MHA parallel on Arm

#107 Reyfone closed 2 months ago
0
attention: add MHA parallel support

#106 Reyfone closed 2 months ago
1
speculate sampling用medusa加载medusa官方模型报错

#105 wcsjtu opened 2 months ago
6
reranker token长度拦截异常

#104 invisifire closed 2 months ago
2
add opt_125M

#103 Nanuion opened 3 months ago
2
新增OPT模型，模型输出不符合预期

#102 samaritan1998 closed 3 months ago
0
[CPU] add implementation for GEMM and token embedding

#101 wenhuanh closed 2 months ago
0
推理出现乱码（показать показать показать показать показать）（USE_NEW_DEVICE_IMPL=1）

#100 w066650 opened 3 months ago
1
[ROCm] refine quantization related code

#99 feifei14119 closed 3 months ago
2
[ROCm] MoE version1

#98 feifei14119 closed 3 months ago
1
[ROCm] Support Int4 and bf16 for rocm version

#97 feifei14119 closed 3 months ago
0
[ROCm] add quant op and port rccl

#96 feifei14119 closed 3 months ago
0
新增OPT模型后跑不通，报CUDA错误

#95 samaritan1998 closed 3 months ago
5
[ROCm] Includes docker container creation script file for rocm build

#94 feifei14119 closed 3 months ago
1
[ROCm] Fix ROCm sampler OP test

#93 feifei14119 closed 3 months ago
0
[cpu-impl] Add for layernorm and rmsnorm

#92 wenhuanh closed 3 months ago
0
HELP: No matching distribution found for torch==2.1.0+cu121 error while install maga_transformer with .whl in release 0.2.0

#91 HuXinjing opened 4 months ago
7
fix: adapt to index based kv cache for Arm device

#90 Reyfone closed 4 months ago
1
`Illegal instruction` error when running version 0.2.0

#89 frankang closed 3 months ago
2
bazel build error

#88 frankang closed 4 months ago
2
[ROCm] Port basic gpt model to rocm. qwen2 end-to-end test pass

#87 feifei14119 closed 3 months ago
5
您好，I'd like to ask a question that might not be very professional. In the code, the weights are loaded through Python. Where are they passed to the C++(fasttransformer) part？

#86 samaritan1998 closed 4 months ago
1
[DRAFT] not ready, please do NOT review

#85 feifei14119 closed 4 months ago
1
support DeepSeek-V2-Lite-Chat

#84 jianglan89 opened 4 months ago
1
feat: add arm cpu device support

#83 TianyuLi0 closed 4 months ago
1
多机单卡/多卡，报错 gang_info self None

#82 MasterJanus closed 4 months ago
0
[ROCm] Init rocm_impl device and add test op

#81 feifei14119 closed 4 months ago
4
feat: add cpu attention api

#80 wenhuanh closed 4 months ago
0
[ROCm] Initial enablement

#79 draganmladjenovic closed 4 months ago
6
git clone Error

#78 hz0ne closed 3 months ago
3
fix(src): fix bazel build special type cast and template match for cuda118

#77 khan-yin closed 4 months ago
14