Hi, thank you for your amazing work. I have two small questions:
Will the max_total_token_num parameter affect the results? I am trying to build an inference server on a 24GB GPU and I noticed that the parameters can cause out-of-memory (OOM) errors on the GPU. So, if I decrease this value, will it negatively impact the test results?
Hi, thank you for your amazing work. I have two small questions:
max_total_token_num
parameter affect the results? I am trying to build an inference server on a 24GB GPU and I noticed that the parameters can cause out-of-memory (OOM) errors on the GPU. So, if I decrease this value, will it negatively impact the test results?Thank you so much!