Open anaivebird opened 1 day ago
gpu memory leak when max_tokens = 1
Can you try it without gather_all_token_logits
?
For the case with gather_all_token_logits
, we need to investigate it.
Thanks, to the best of my memory, without gather_all_token_logits
, it works well.
System Info
Who can help?
@byshiue @juney-nvidia @ncomly-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
1000 requests should finished normally
actual behavior
additional notes
no