flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.67k stars 224 forks source link

use the cuda graph for specinfer and it's blocked #1314

Open lambda7xx opened 7 months ago

lambda7xx commented 7 months ago
lambda7xx commented 7 months ago

the log

[0 - 7fb5bc50b000]    0.244498 {3}{RequestManager}: [1000014]New request tokens: 1 14350 5381 3814 363 385 319 29902 5001 376 29911 5171 29908 304 7780 300 260 1169 373 27637 1308 29889
[0]14350
[1]4280
[2]775
[3]304
[4]15833
[5]7339
[6]29915
[7]29879
[8]17294
[9]29892
[10]372
[11]338
[12]6364
[13]491
[14]25570
[15]322
[16]3196
[17]12417
[18]1994
[19]29889
Num of SSMs: 1
[0 - 7fb5bc50b000]    0.244543 {3}{RequestManager}: [1000015]New request tokens: 1 14350 4280 775 304 15833 7339 29915 29879 17294 29892 372 338 6364 491 25570 322 3196 12417 1994 29889
[0]3185
[1]763
[2]6682
[3]350
[4]1038
[5]322
[6]2649
[7]263
[8]2958
[9]446
[10]1048
[11]2581
[12]459
[13]538
[14]29891
Num of SSMs: 1
[0 - 7fb5bc50b000]    0.244572 {3}{RequestManager}: [1000016]New request tokens: 1 3185 763 6682 350 1038 322 2649 263 2958 446 1048 2581 459 538 29891
[0]14350
[1]263
[2]26576
[3]920
[4]1260
[5]265
[6]19533
[7]20147
[8]304
[9]323
[10]5171
[11]1363
[12]540
[13]6091
[14]260
[15]1169
[16]6963
[17]14568
[18]310
[19]274
[20]1161
Num of SSMs: 1
reyna-abhyankar commented 7 months ago

Could be related to #1319