TabbyML / tabby

Self-hosted AI coding assistant
https://tabby.tabbyml.com/
Other
21.26k stars 958 forks source link

Code completion often experiences some lag. Are there any optimization configuration methods? #2493

Open 5bug opened 3 months ago

5bug commented 3 months ago

Phenomenon description: In some cases, the code completion function experiences some lag and does not return completion content for a long time, sometimes for more than 10 seconds. Known information: The llama-server process's CPU usage can reach 100%, while the GPU average utilization rate is around 20%, with a peak of no more than 40%. Machine configuration: Single GPU, NVIDIA V100*8, 64C/256G

Are there any optimization deployment configurations to improve performance?

thanks!