Code completion often experiences some lag. Are there any optimization configuration methods?

Phenomenon description: In some cases, the code completion function experiences some lag and does not return completion content for a long time, sometimes for more than 10 seconds. Known information: The llama-server process's CPU usage can reach 100%, while the GPU average utilization rate is around 20%, with a peak of no more than 40%. Machine configuration: Single GPU, NVIDIA V100*8, 64C/256G

Are there any optimization deployment configurations to improve performance?

thanks!

TabbyML / tabby

Code completion often experiences some lag. Are there any optimization configuration methods? #2493