Open Jimmy-L99 opened 3 days ago
branch: dev
INFO:asyncio:input_tokens: <|system|> User will provide you with a speech instruction. Do it step by step. First, think about the instruction and respond in a interleaved manner, with 13 text token followed by 26 audio tokens. <|user|> 你能读出下面这首岳阳楼记吗?: 庆历四年春,滕子京谪(zhé)守巴陵郡。越明年,政通人和,百废具兴。乃重修岳阳楼,增其旧制,刻 唐贤今人诗赋于其上。属(zhǔ)予(yú)作文以记之。 予观夫(fú)巴陵胜状,在洞庭一湖。衔远山,吞长江,浩浩汤汤(shāng),横无际涯;朝晖夕阴,气象万千。此则岳阳楼之大观也。前人之述备矣。然则北通巫峡,南极潇湘,迁客骚人,多会于此,览物之情,得无异乎? 若夫淫雨霏霏,连月不开;阴风怒号,浊浪排空;日星隐曜(yào),山岳潜形;商旅不行,樯(qiáng)倾楫(jí)摧;薄(bó)暮冥冥,虎啸猿啼。登斯楼也,则有去国怀乡,忧谗畏讥,满目萧然,感极而悲者矣。 至若春和景明,波澜不惊,上下天光,一碧万顷;沙鸥翔集,锦鳞游泳,岸芷(zhǐ)汀(tīng)兰,郁郁青青。而或长烟一空,皓月千里,浮光跃金,静影沉璧;渔歌互答,此乐何极!登斯楼也,则有心旷神怡,宠辱偕忘,把酒临风,其喜洋洋者矣。 嗟(jiē)夫(fú)!予(yú)尝求古仁人之心,或异二者之为,何哉(zāi)? 不以物喜,不以己悲;居庙堂之高则忧其民;处(chǔ)江湖之远则忧其君。是进亦忧,退亦忧。然则何时而乐耶?其必曰:“先天下之忧而忧,后天下之乐而乐”乎。噫(yī)!微斯人,吾谁与归? <|assistant|>streaming_transcription INFO: 172.16.21.155:56372 - "POST /audio HTTP/1.1" 200 OK INFO 11-21 09:45:16 metrics.py:349] Avg prompt throughput: 9.6 tokens/s, Avg generation throughput: 0.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.2%, CPU KV cache usage: 0.0%. INFO:asyncio:Latency for generating first token: 0.2702457904815674 seconds INFO 11-21 09:45:22 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.9 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%. INFO 11-21 09:45:27 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 18.8 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%. INFO 11-21 09:45:32 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%. INFO 11-21 09:45:37 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 18.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%. INFO 11-21 09:45:42 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%. INFO 11-21 09:45:48 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 13.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%. INFO 11-21 09:45:53 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 14.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.5%, CPU KV cache usage: 0.0%. INFO 11-21 09:45:58 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.9 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.5%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:03 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 9.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.5%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:09 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.5%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:15 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 9.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.5%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:22 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 9.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.6%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:29 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 7.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.6%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:34 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 12.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.6%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:41 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 3.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.6%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:46 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:46:51 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 9.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:00 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:05 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 7.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:10 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 3.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:16 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 5.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:22 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 5.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:27 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 3.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.7%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:34 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 5.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:40 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.9 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:47 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.9 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO 11-21 09:47:54 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 3.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO 11-21 09:48:01 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO 11-21 09:48:08 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 4.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO 11-21 09:48:15 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 3.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO 11-21 09:48:23 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%. INFO:asyncio:Decode efficiency: 7.97125203148865 tokens/second
模型推理速度到后面越来越慢,最后《岳阳楼记》念了一半就直接结束了。 附上保存的语音输出。 complete_audio.zip
目前长文本语音能力有限,请关注后续的技术报告,应该是目前短文本表现最佳
branch: dev
模型推理速度到后面越来越慢,最后《岳阳楼记》念了一半就直接结束了。 附上保存的语音输出。 complete_audio.zip