L1aoXingyu / llm-infer-bench

11 stars 0 forks source link

Question about max_total_token_num. #2

Open Dominic789654 opened 1 year ago

Dominic789654 commented 1 year ago

Hi, thank you for your amazing work. I have two small questions:

  1. Will the max_total_token_num parameter affect the results? I am trying to build an inference server on a 24GB GPU and I noticed that the parameters can cause out-of-memory (OOM) errors on the GPU. So, if I decrease this value, will it negatively impact the test results?
  2. Could you provide me with the code for plotting?

Thank you so much!