issues
search
InternLM
/
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.13k
stars
280
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Support Qwen2-1.5b awq
#1793
AllentDan
closed
1 week ago
7
"Aborted (core dumped)" when running Qwen2-7B-Instruct [Bug]
#1792
kaishxu
closed
1 week ago
7
fix: prevent numpy breakage
#1791
zhyncs
closed
1 week ago
2
[Feature] 多模态api_server推理速度性能测试
#1790
LRHstudy
opened
2 weeks ago
7
Refine AsyncEngine exception handler
#1789
AllentDan
closed
1 week ago
1
[Bug] Client-aborted streaming requests 'leak', which eventually stalls/crashes turbomind after 100 to 300 requests
#1788
josephrocca
closed
1 week ago
4
是否兼容openai中参数n的设置?尝试设置n>1,但仍然只返回一条结果
#1787
hitsz-zxw
opened
2 weeks ago
1
[Bug] Qwen/Qwen2-72B-Instruct AWQ Quantization NaN Error
#1786
serser
opened
2 weeks ago
9
[Docs] 吞吐的提升主要是因为重写了GQA的kernel?
#1785
CSEEduanyu
opened
2 weeks ago
9
[Feature] support Nemotron-4 340B
#1784
zhyncs
opened
2 weeks ago
1
about getting the deterministic answer from VLM model, such as InternVL-Chat-V1-5-AWQ
#1783
tairen99
closed
6 days ago
9
support qwen2 1.5b
#1782
lvhan028
closed
2 weeks ago
4
[Bug] 运行时报错
#1781
bltcn
closed
2 weeks ago
1
Add anomaly handler
#1780
lzhangzz
closed
2 weeks ago
0
多模态base64的接口有diff
#1779
CSEEduanyu
opened
2 weeks ago
3
[side-effect]Fix param `--cache-max-entry-count` is not taking effect (#1758)
#1778
QwertyJack
closed
2 weeks ago
2
[Feature] qwen2系列模型
#1777
Vincent131499
closed
1 week ago
8
[Feature] 请问可以支持智谱团队的CogVLM2的量化嘛?
#1776
EasonGZY
opened
2 weeks ago
1
Device dispatcher
#1775
grimoire
closed
1 week ago
7
能支持下mini_internvl_2b_1.5模型的部署么?
#1774
moyans
closed
2 weeks ago
2
Encode raw image file to base64
#1773
irexyc
closed
2 weeks ago
0
add qwen2 model into testcase
#1772
zhulinJulia24
closed
2 weeks ago
0
Error When loading 'openbmb/MiniCPM-Llama3-V-2_5'
#1771
Fahmie23
opened
2 weeks ago
20
lock setuptools version in dockerfile
#1770
RunningLeon
closed
2 weeks ago
0
skip inference for oversized inputs
#1769
grimoire
closed
1 week ago
0
Fix finish_reason
#1768
AllentDan
closed
2 weeks ago
1
[Feature] support edge chips
#1767
PredyDaddy
closed
2 weeks ago
3
[Bug] 为什么pipeline输出只有一个1个token?
#1766
Axiaozhu1
opened
2 weeks ago
13
More accurate time logging for ImageEncoder and fix concurrent image processing corruption
#1765
irexyc
closed
1 week ago
2
[Feature] 请问支持ChatGLM3吗
#1764
Franklin-L
opened
2 weeks ago
1
Add tools to api_server for InternLM2 model
#1763
AllentDan
opened
2 weeks ago
9
[Feature] 多模态的模型支持在线serving吗?
#1762
CSEEduanyu
closed
4 days ago
12
fix falcon attention
#1761
grimoire
closed
2 weeks ago
0
[Feature] 使用已经构建好的input使用lmdeploy来进行推理
#1760
KooSung
closed
6 days ago
5
[Bug] ImageEncoder INFO 日志耗时统计不准确
#1759
DefTruth
closed
2 weeks ago
3
[Bug] Turbomind 后端显存占用翻倍
#1758
QwertyJack
closed
2 weeks ago
5
[Bug] 判断条件检查
#1757
seetimee
closed
1 week ago
2
[Bug] Key Error loading OpenGVLab/Mini-InternVL-Chat-4B-V1-5
#1756
HaoLiuHust
closed
1 week ago
2
[Bug] tp=4 tp=8 no response
#1755
zeroleavebaoyang
opened
2 weeks ago
6
fix uncached stop words
#1754
grimoire
closed
2 weeks ago
3
Detokenize with prompt token ids
#1753
AllentDan
closed
1 week ago
0
[Bug] Using pipeline inference CogVLM2 works fine but server fails
#1752
xiangqi1997
closed
2 weeks ago
2
refactor config
#1751
grimoire
closed
2 weeks ago
0
[Bug] Official image doesn't work for 4090 on CUDA 12.3 (but works for all other CUDA versions, and works for 12.3 on other GPU types)
#1750
josephrocca
opened
2 weeks ago
5
[Feature] Low priority: Allow specifying HuggingFace model/repo name in `lmdeploy convert`
#1749
josephrocca
opened
2 weeks ago
2
[Feature] Support for compact Vision-Language models
#1748
vody-am
opened
2 weeks ago
3
[Bug] xcomposer 4khd lora weight error in lmdeploy
#1747
ztfmars
closed
1 day ago
11
[Feature] Qwen 2 Support
#1746
suptejas
closed
3 weeks ago
2
[Feature] `min_p` sampling parameter
#1745
josephrocca
opened
3 weeks ago
1
[Bug] Many concurrent requests with `--enable-prefix-caching` AND `--quant-policy 8` crashes with: `CUDA runtime error: an illegal memory access was encountered /opt/lmdeploy/src/turbomind/utils/allocator.h:231`
#1744
josephrocca
closed
1 week ago
22
Previous
Next