InternLM lmdeploy issues

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

3.13k stars 280 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Support Qwen2-1.5b awq

#1793 AllentDan closed 1 week ago
7
"Aborted (core dumped)" when running Qwen2-7B-Instruct [Bug]

#1792 kaishxu closed 1 week ago
7
fix: prevent numpy breakage

#1791 zhyncs closed 1 week ago
2
[Feature] 多模态api_server推理速度性能测试

#1790 LRHstudy opened 2 weeks ago
7
Refine AsyncEngine exception handler

#1789 AllentDan closed 1 week ago
1
[Bug] Client-aborted streaming requests 'leak', which eventually stalls/crashes turbomind after 100 to 300 requests

#1788 josephrocca closed 1 week ago
4
是否兼容openai中参数n的设置？尝试设置n>1，但仍然只返回一条结果

#1787 hitsz-zxw opened 2 weeks ago
1
[Bug] Qwen/Qwen2-72B-Instruct AWQ Quantization NaN Error

#1786 serser opened 2 weeks ago
9
[Docs] 吞吐的提升主要是因为重写了GQA的kernel？

#1785 CSEEduanyu opened 2 weeks ago
9
[Feature] support Nemotron-4 340B

#1784 zhyncs opened 2 weeks ago
1
about getting the deterministic answer from VLM model, such as InternVL-Chat-V1-5-AWQ

#1783 tairen99 closed 6 days ago
9
support qwen2 1.5b

#1782 lvhan028 closed 2 weeks ago
4
[Bug] 运行时报错

#1781 bltcn closed 2 weeks ago
1
Add anomaly handler

#1780 lzhangzz closed 2 weeks ago
0
多模态base64的接口有diff

#1779 CSEEduanyu opened 2 weeks ago
3
[side-effect]Fix param `--cache-max-entry-count` is not taking effect (#1758)

#1778 QwertyJack closed 2 weeks ago
2
[Feature] qwen2系列模型

#1777 Vincent131499 closed 1 week ago
8
[Feature] 请问可以支持智谱团队的CogVLM2的量化嘛？

#1776 EasonGZY opened 2 weeks ago
1
Device dispatcher

#1775 grimoire closed 1 week ago
7
能支持下mini_internvl_2b_1.5模型的部署么？

#1774 moyans closed 2 weeks ago
2
Encode raw image file to base64

#1773 irexyc closed 2 weeks ago
0
add qwen2 model into testcase

#1772 zhulinJulia24 closed 2 weeks ago
0
Error When loading 'openbmb/MiniCPM-Llama3-V-2_5'

#1771 Fahmie23 opened 2 weeks ago
20
lock setuptools version in dockerfile

#1770 RunningLeon closed 2 weeks ago
0
skip inference for oversized inputs

#1769 grimoire closed 1 week ago
0
Fix finish_reason

#1768 AllentDan closed 2 weeks ago
1
[Feature] support edge chips

#1767 PredyDaddy closed 2 weeks ago
3
[Bug] 为什么pipeline输出只有一个1个token？

#1766 Axiaozhu1 opened 2 weeks ago
13
More accurate time logging for ImageEncoder and fix concurrent image processing corruption

#1765 irexyc closed 1 week ago
2
[Feature] 请问支持ChatGLM3吗

#1764 Franklin-L opened 2 weeks ago
1
Add tools to api_server for InternLM2 model

#1763 AllentDan opened 2 weeks ago
9
[Feature] 多模态的模型支持在线serving吗？

#1762 CSEEduanyu closed 4 days ago
12
fix falcon attention

#1761 grimoire closed 2 weeks ago
0
[Feature] 使用已经构建好的input使用lmdeploy来进行推理

#1760 KooSung closed 6 days ago
5
[Bug] ImageEncoder INFO 日志耗时统计不准确

#1759 DefTruth closed 2 weeks ago
3
[Bug] Turbomind 后端显存占用翻倍

#1758 QwertyJack closed 2 weeks ago
5
[Bug] 判断条件检查

#1757 seetimee closed 1 week ago
2
[Bug] Key Error loading OpenGVLab/Mini-InternVL-Chat-4B-V1-5

#1756 HaoLiuHust closed 1 week ago
2
[Bug] tp=4 tp=8 no response

#1755 zeroleavebaoyang opened 2 weeks ago
6
fix uncached stop words

#1754 grimoire closed 2 weeks ago
3
Detokenize with prompt token ids

#1753 AllentDan closed 1 week ago
0
[Bug] Using pipeline inference CogVLM2 works fine but server fails

#1752 xiangqi1997 closed 2 weeks ago
2
refactor config

#1751 grimoire closed 2 weeks ago
0
[Bug] Official image doesn't work for 4090 on CUDA 12.3 (but works for all other CUDA versions, and works for 12.3 on other GPU types)

#1750 josephrocca opened 2 weeks ago
5
[Feature] Low priority: Allow specifying HuggingFace model/repo name in `lmdeploy convert`

#1749 josephrocca opened 2 weeks ago
2
[Feature] Support for compact Vision-Language models

#1748 vody-am opened 2 weeks ago
3
[Bug] xcomposer 4khd lora weight error in lmdeploy

#1747 ztfmars closed 1 day ago
11
[Feature] Qwen 2 Support

#1746 suptejas closed 3 weeks ago
2
[Feature] `min_p` sampling parameter

#1745 josephrocca opened 3 weeks ago
1
[Bug] Many concurrent requests with `--enable-prefix-caching` AND `--quant-policy 8` crashes with: `CUDA runtime error: an illegal memory access was encountered /opt/lmdeploy/src/turbomind/utils/allocator.h:231`

#1744 josephrocca closed 1 week ago
22

Previous Next