InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.2k stars 380 forks source link

[Bug] api server with yi prompt exceed about 1986 Chinese character will response empty text and negative token #924

Closed weicheng59 closed 8 months ago

weicheng59 commented 8 months ago

Checklist

Describe the bug

Model:

https://huggingface.co/01-ai/Yi-34B-Chat https://huggingface.co/NousResearch/Nous-Capybara-34B

Api usage:

url: http://192.168.40.41:8888/v1/chat/interactive

body: 1986 Chinese charactor request

{
    "model": "yi",
    "prompt": "怪牛售些馅图送许淋娇悠粉况渗宴填符格乡明附结郎操作假电败衡混秀屈阔丛惰惨粮炕救未酒遵糟今弟拼蛋络赵朗菌多灵吗诵封李司棒稳微数颜溉库扶俩惑煤实静衬脂毫幕绣答诱候码掀东汁爷丢废落己绝葛露步址隐锯嚷惩樱宋右影阶啦创薪蹄挖温误边问冶酿欢世汉崭嘉或男饺识紧益呆屑磨椅育革其定述觉移唉扬含利僚柔滤忧雾网存搬浇雪很侦诗眠蛙童荒情给已径控陈狭稍泉胃涉虏窑伏估宣熄贤鞋堤忌辱基专亲睁旗几粥批勾牙佳旷场饼轧之递空里宅传晚振雨翼易谣哲宁叮乌亩躺刘墙枝态规树搅麻县招瞧厦玉纸动挂搭敢染危窗淹披炸芝可窄鹊四排毙亏驻闯婆十秃顾式秒齿叔但敬台条清礼生掏胁受敲吉筑枪焦泰润飘撑逆葵蔑探现脊矮声雕荡屠地钻严笑挠叶询旁刺尾躁鞭耗顺狱冰牧初聋尸旧榆坡寨愤潮踩迁评申铃章蚂都师逼羞拳除夺筐每行芒圆棚蜓馆炮愿座审删巧掠决句也域兆腥倚价飞蝴寇蔽汇两躬刃扮左新串俯桥灌囊傲芹不聚装事鬼劲翁蒙陵赤依铺委技讯将更盾尖繁弊峰一洲耍冈允嗓卡狡简梦渡妥掘和耕月阻烈担呢煮钉钟团饱傍鄙贩胳赢脾他银触轿被向汤版拾笨她贸前摄时逝肢悄歼翅拣闷稀夫充羽殿下平岗剥虫绩泥楚巴菊汗凶镰冻北膨孩杨合慎虎良巾魔产喂漏拦头踏轰耽争孝舒拔纯白镜类棍善玩酸笋陶沾谱姜舰孤搂菜尿万奉絮至湾茫远舅卜呈祝邪轻宇催丰酬又把腰眉血沸耳蕉言洪沃民痛兰泪横丘陪宫谎写乎刚浓彼份彻裙命欣剂唇理研宽据鼠叛料禁史剃欠坏忽度特略么房悬渔榴便朽灯淘绘六宗确官拿保辜害件筝枯乞矿思系扣舟力咸齐倾孕绒盟沙匆锤猫程缺直阅怨分肌记篮撇有晌疼禽铜报砍避爱长累味划跟珠柜惊屿厨暴眼我仙椒壤拆仗梅疯硬插坛快活锹词票弹古疆皆足释艇泳够得原淡颂匠化惠重峡级鹿叼炒描享慨抽洞信灿筒赶凑踪爹悦折霸音肚粒土肾爸缝致服为案量萌滚默卵宜握喉习券收芦像贞爬疏臣席蚊威七过洁即循疤极次烂班风射圣览唤鲜心纱短叫鞠妻袖邮页臭萄臂助往家朴浮桂锻判雹泻拖印茄速塔晃障典岛州燃农沈派笼搞怒蒸咏饲术槐劫腾旱社森浅畏届机垮锈堵歇浙扎线切韵岩吞誓克找距贱拒笛梯晕积傅马转绳粱丈军需迈孔众还炎追罩餐护摊石剩鸦崖葡裤然昌熊恨蛇辈变兔断乃妙诸责励播侧吵扰缴雀龟哨偏汪绪杯药毅弯灶念宿纲纪暂恰昼渐法中旨瑞翠摔逮聪室捕菠饶旅核介毕金滴最懂城斑怀失际桐汽调约怕赔罐街骨帜伶纹侮缠照查山常裕通穗铸景厕秆栽廊慧欧捧采埋该围名骤反键猛而乓颠户乘歪奶筹潜务墓轮政衫亿肩谁赌屯破蹈疫喘旦源筋律拜真概患免犬胸显令脉船院叨俗剖祸饿努肝值增坑垒纺疗床衰扔倦稠胆龙泼延仪耐怖路编难贷俱竿钱参演居签悔材语劳锋拴腊奇烤锦吊授圈籍煌黑宙榜晶瘦撒扛勇锅厘层鱼钓素独塞压柄注慢游凝近瓦懒茅刻敏三职伟捎踢临艘忙膛备奋誉状挥劝尼萝沟群慕列猜如百型壶宪兄讽庙费间块索馋别响丝挑畅睡抛诉抢腹袜伍脏持水姻贝肥南上滔口超率季洗趴老日构阿盘嫩销减吨滑框片橘矩坦仇舱擦建丸幻勒桶火越及择趣骑罢赴讲裁冷境乒屋赚从订膀亚乐告拘钳辟抖油咐植则截底绢纳咱抵厚来亮朋村趟监进遭闻标弱怜秧贪刷捏姐葬那第悲预相俊元趋义赖车牵甜燥补午肯甲开死浪秋挡遣接位梢叹华阁界译纽面咬亡对丑冤震并茧虾箭着棉逗碌带迅挺枣辩尺各话惧捐跨唱鹰乖队酷哪肆蓝昨健仆松蛾形若妈萍留廉二疑组彩谊疮伪退江迷客驶舞剪痕检停违佩校谢猴望鸭吓馒啄宾拥喇爽裹伐后燕羡奖镇斗光饭蝶鹅莫顷在凭盈愁哥驾神悉搜了桑铁艳伙脚族炼翻花市洋闪睛顶奔梨府姿雁裂粪揪拉满姑战慌复策傻冒壳虽于喝锡孙钞具羊且怎蜡呜磁器竖姨鸡寸碍首管改岸滥宝泊苍桨锁滨付否读意寒毛驳仅住盐婚捡茂苦藏碎营涝倡它段哄均软竞无隶返准毯配捆陡详维喊痰晨堡庄宰犁普想替霉测覆安抬帘帮柱摩肤供肺盲胖粘绑乳限烧蜘扭录寄文阳兴盛井肉辰唐扑塌蔬偶央入字冠烫取园休束由斜济年棋驴苗见遥证蛛负弓梳艰似朱却狸芽大吹究倍随壁修攀选完捉秤帆再偿践海泛急小喜捞提亦胶广额办辞副舌跪蹲甚待协果脱娱赞愚漂隔坐乔私渠谷立德晋效夏妇缸涌公辽尝子考止投徐贴隆感阀湖单贼星宏溪骆消商榨身妄尤冬会抄仁奸异塘差属毒先粗算邻咳以体呀荣荐王康区就优茎圾污鼓麦摆视档忠非鉴匀寿脑炭制环赏氏恼俘糕帐裳猎烛搁妹柿斯辛液妨勿针绞例蜂篇迫某吧做拢警交董遇阴番嗽样访殖知袍摇掩少歉号睬幼节迎庭帝奥绕朵胀坚青诊饮岭天乙抓架品贺卧灭借纤蠢龄薯胡险援展货练离虹徒嚼棕氧缩滩躲劈占遗欺医耻僻糖蹦叙苏股跑璃扫周局耀早惹课侄港密损跃熟兼始代嫂布柴母必武久病膜俭症墨贵酱斧漫吃末骗肠店吼矛闸婶郑杠卖夹比眨内获叠掉辅钥沿谨食赛豪揭企凳示顽甩姥皱这扩炉财执缎册窃惯智归颗腔既终范骡匪省堂幸此吴领煎浑仔波蚁教困腐尽紫吸们妖献窝佛痒壮虚伤刮辣祖辫整闭醒九债抱女勤坊任卷灾凡忘恋撞衔较滋悼升八摘瞒寺斥势织暮洽父纠贫牲巷烘认道黄召魄使儿烦浆物尘放爆丧雄谅发毁秩草屡垃科宵迹你桌挽巨葱激抹太透训蚕械奏达杜租站颤恳敞劣稻垂醉棵熔林驼旋恶吐狼堆戚岔残鸽衣鼻计造铅暖秘梁削寻盗赠容栋融御雅绸析说关盖娃侍回购脆肿娘涨连钢所期牢伴讨铲颈醋互半蓄联拨加稿侨主厉款田啊啊啊啊啊噢",
    "session_id": -1,
    "temperature": 0.6,
    "request_output_len": 512,
    "repetition_penalty": 1,
    "interactive_mode": false
}

response is normal:

{
    "text": "很抱歉",
    "tokens": 2,
    "finish_reason": "stop"
}

body: 1987 Chinese charactor request

{
    "model": "yi",
    "prompt": "怪牛售些馅图送许淋娇悠粉况渗宴填符格乡明附结郎操作假电败衡混秀屈阔丛惰惨粮炕救未酒遵糟今弟拼蛋络赵朗菌多灵吗诵封李司棒稳微数颜溉库扶俩惑煤实静衬脂毫幕绣答诱候码掀东汁爷丢废落己绝葛露步址隐锯嚷惩樱宋右影阶啦创薪蹄挖温误边问冶酿欢世汉崭嘉或男饺识紧益呆屑磨椅育革其定述觉移唉扬含利僚柔滤忧雾网存搬浇雪很侦诗眠蛙童荒情给已径控陈狭稍泉胃涉虏窑伏估宣熄贤鞋堤忌辱基专亲睁旗几粥批勾牙佳旷场饼轧之递空里宅传晚振雨翼易谣哲宁叮乌亩躺刘墙枝态规树搅麻县招瞧厦玉纸动挂搭敢染危窗淹披炸芝可窄鹊四排毙亏驻闯婆十秃顾式秒齿叔但敬台条清礼生掏胁受敲吉筑枪焦泰润飘撑逆葵蔑探现脊矮声雕荡屠地钻严笑挠叶询旁刺尾躁鞭耗顺狱冰牧初聋尸旧榆坡寨愤潮踩迁评申铃章蚂都师逼羞拳除夺筐每行芒圆棚蜓馆炮愿座审删巧掠决句也域兆腥倚价飞蝴寇蔽汇两躬刃扮左新串俯桥灌囊傲芹不聚装事鬼劲翁蒙陵赤依铺委技讯将更盾尖繁弊峰一洲耍冈允嗓卡狡简梦渡妥掘和耕月阻烈担呢煮钉钟团饱傍鄙贩胳赢脾他银触轿被向汤版拾笨她贸前摄时逝肢悄歼翅拣闷稀夫充羽殿下平岗剥虫绩泥楚巴菊汗凶镰冻北膨孩杨合慎虎良巾魔产喂漏拦头踏轰耽争孝舒拔纯白镜类棍善玩酸笋陶沾谱姜舰孤搂菜尿万奉絮至湾茫远舅卜呈祝邪轻宇催丰酬又把腰眉血沸耳蕉言洪沃民痛兰泪横丘陪宫谎写乎刚浓彼份彻裙命欣剂唇理研宽据鼠叛料禁史剃欠坏忽度特略么房悬渔榴便朽灯淘绘六宗确官拿保辜害件筝枯乞矿思系扣舟力咸齐倾孕绒盟沙匆锤猫程缺直阅怨分肌记篮撇有晌疼禽铜报砍避爱长累味划跟珠柜惊屿厨暴眼我仙椒壤拆仗梅疯硬插坛快活锹词票弹古疆皆足释艇泳够得原淡颂匠化惠重峡级鹿叼炒描享慨抽洞信灿筒赶凑踪爹悦折霸音肚粒土肾爸缝致服为案量萌滚默卵宜握喉习券收芦像贞爬疏臣席蚊威七过洁即循疤极次烂班风射圣览唤鲜心纱短叫鞠妻袖邮页臭萄臂助往家朴浮桂锻判雹泻拖印茄速塔晃障典岛州燃农沈派笼搞怒蒸咏饲术槐劫腾旱社森浅畏届机垮锈堵歇浙扎线切韵岩吞誓克找距贱拒笛梯晕积傅马转绳粱丈军需迈孔众还炎追罩餐护摊石剩鸦崖葡裤然昌熊恨蛇辈变兔断乃妙诸责励播侧吵扰缴雀龟哨偏汪绪杯药毅弯灶念宿纲纪暂恰昼渐法中旨瑞翠摔逮聪室捕菠饶旅核介毕金滴最懂城斑怀失际桐汽调约怕赔罐街骨帜伶纹侮缠照查山常裕通穗铸景厕秆栽廊慧欧捧采埋该围名骤反键猛而乓颠户乘歪奶筹潜务墓轮政衫亿肩谁赌屯破蹈疫喘旦源筋律拜真概患免犬胸显令脉船院叨俗剖祸饿努肝值增坑垒纺疗床衰扔倦稠胆龙泼延仪耐怖路编难贷俱竿钱参演居签悔材语劳锋拴腊奇烤锦吊授圈籍煌黑宙榜晶瘦撒扛勇锅厘层鱼钓素独塞压柄注慢游凝近瓦懒茅刻敏三职伟捎踢临艘忙膛备奋誉状挥劝尼萝沟群慕列猜如百型壶宪兄讽庙费间块索馋别响丝挑畅睡抛诉抢腹袜伍脏持水姻贝肥南上滔口超率季洗趴老日构阿盘嫩销减吨滑框片橘矩坦仇舱擦建丸幻勒桶火越及择趣骑罢赴讲裁冷境乒屋赚从订膀亚乐告拘钳辟抖油咐植则截底绢纳咱抵厚来亮朋村趟监进遭闻标弱怜秧贪刷捏姐葬那第悲预相俊元趋义赖车牵甜燥补午肯甲开死浪秋挡遣接位梢叹华阁界译纽面咬亡对丑冤震并茧虾箭着棉逗碌带迅挺枣辩尺各话惧捐跨唱鹰乖队酷哪肆蓝昨健仆松蛾形若妈萍留廉二疑组彩谊疮伪退江迷客驶舞剪痕检停违佩校谢猴望鸭吓馒啄宾拥喇爽裹伐后燕羡奖镇斗光饭蝶鹅莫顷在凭盈愁哥驾神悉搜了桑铁艳伙脚族炼翻花市洋闪睛顶奔梨府姿雁裂粪揪拉满姑战慌复策傻冒壳虽于喝锡孙钞具羊且怎蜡呜磁器竖姨鸡寸碍首管改岸滥宝泊苍桨锁滨付否读意寒毛驳仅住盐婚捡茂苦藏碎营涝倡它段哄均软竞无隶返准毯配捆陡详维喊痰晨堡庄宰犁普想替霉测覆安抬帘帮柱摩肤供肺盲胖粘绑乳限烧蜘扭录寄文阳兴盛井肉辰唐扑塌蔬偶央入字冠烫取园休束由斜济年棋驴苗见遥证蛛负弓梳艰似朱却狸芽大吹究倍随壁修攀选完捉秤帆再偿践海泛急小喜捞提亦胶广额办辞副舌跪蹲甚待协果脱娱赞愚漂隔坐乔私渠谷立德晋效夏妇缸涌公辽尝子考止投徐贴隆感阀湖单贼星宏溪骆消商榨身妄尤冬会抄仁奸异塘差属毒先粗算邻咳以体呀荣荐王康区就优茎圾污鼓麦摆视档忠非鉴匀寿脑炭制环赏氏恼俘糕帐裳猎烛搁妹柿斯辛液妨勿针绞例蜂篇迫某吧做拢警交董遇阴番嗽样访殖知袍摇掩少歉号睬幼节迎庭帝奥绕朵胀坚青诊饮岭天乙抓架品贺卧灭借纤蠢龄薯胡险援展货练离虹徒嚼棕氧缩滩躲劈占遗欺医耻僻糖蹦叙苏股跑璃扫周局耀早惹课侄港密损跃熟兼始代嫂布柴母必武久病膜俭症墨贵酱斧漫吃末骗肠店吼矛闸婶郑杠卖夹比眨内获叠掉辅钥沿谨食赛豪揭企凳示顽甩姥皱这扩炉财执缎册窃惯智归颗腔既终范骡匪省堂幸此吴领煎浑仔波蚁教困腐尽紫吸们妖献窝佛痒壮虚伤刮辣祖辫整闭醒九债抱女勤坊任卷灾凡忘恋撞衔较滋悼升八摘瞒寺斥势织暮洽父纠贫牲巷烘认道黄召魄使儿烦浆物尘放爆丧雄谅发毁秩草屡垃科宵迹你桌挽巨葱激抹太透训蚕械奏达杜租站颤恳敞劣稻垂醉棵熔林驼旋恶吐狼堆戚岔残鸽衣鼻计造铅暖秘梁削寻盗赠容栋融御雅绸析说关盖娃侍回购脆肿娘涨连钢所期牢伴讨铲颈醋互半蓄联拨加稿侨主厉款田啊啊啊啊啊噢",
    "session_id": -1,
    "temperature": 0.6,
    "request_output_len": 512,
    "repetition_penalty": 1,
    "interactive_mode": false
}

response is empty:

{
    "text": "",
    "tokens": -144501870,
    "finish_reason": "stop"
}

At first I thought its the problem of Yi-34B-Chat only support 4k context so I tried Nous-Capybara-34B which should have 200k context but the problem is the same. I recalled from a previous version that i dont remember eactly which, setting the --session_len 10000 will solve this problem but in the lastest version it does not seemed to do anything.

Let me know if you need any more detail.

Reproduction

CMD to serve: lmdeploy serve api_server /path/to/yi --model-name yi --session_len 10000 --server_name 0.0.0.0 --server_port 8886 --instance_num 128 --tp 1

Environment

lmdeploy check_env
sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6: NVIDIA A100-SXM4-80GB
CUDA_HOME: /home/llm-download/cuda-12.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.1.0+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.5  (built against CUDA 11.7)
    - Built with CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

LMDeploy: 0.1.0+
transformers: 4.35.0
gradio: 3.50.2
fastapi: 0.104.1
pydantic: 2.5.0

Error traceback

No response

weicheng59 commented 8 months ago

update: using LMDeploy: 0.0.14+62282fe change the config.ini's session_len to 4096 and copy model module of yi to model.py with session_len=4096 in arg self.session_len = session_len in init I can get response normally using 1987 Chinese charactor request

lvhan028 commented 8 months ago

@weicheng59 we are rushing for lmdeploy v0.2.0, which is planned to be released on 1.15 @AllentDan will get back to this issue after the rush

weicheng59 commented 8 months ago

Your reply is appreciated. I will just use 0.0.14 version for now. Will help out test this issue in v0.2.0 after your release and good luck on the rush.

weicheng59 commented 8 months ago

this issue is not present on the latest build v0.2.0.