Closed Ishiki-Iroha closed 9 months ago
When I use a single L40S to run, there is no problem. When testing the 13B model, did you use multiple cards or a single card? I want to ensure we have a clear understanding of the testing conditions. llama-13B-chat on 1 L40S,result: speed 67.24270358594262 speed0 22.81789898647827 ratio 2.9469279194280853
Did you use multiple cards or a single card?
We conducted the tests using 2x RTX 3090.
RuntimeError
Thank you for identifying this bug, it has now been fixed.
Hello, I have reproduced it on vicuna7b and llama7b-chat on 8 L40S, and the results are quite amazing:
However, when I tried llama13b-chat, I encountered the following problem:
When I run the following command it works well: