[Help needed] CPU usage do not decrease after a request is completed

Does someone else also has the problem that after chat request the cpu load do not decrease?

I'm using CodeLlama-34B-Instruct-GGUF and the ChatGPT-Next-Web-UI.

With other bindings i do not have this problem e.g. ialacol.

Logs are looking normal:


Defaulted container "test-local-ai" out of: test-local-ai, download-model (init)
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name      : AMD EPYC-Milan Processor
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat umip pku ospke rdpid fsrm
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
@@@@@
2:33AM INF Starting LocalAI using 24 threads, with models path: /models
2:33AM INF LocalAI version: v1.30.0 (274ace289823a8bacb7b4987b5c961b62d5eee99)

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.49.2                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 70  Processes ........... 1 │
 │ Prefork ....... Disabled  PID ................ 14 │
 └───────────────────────────────────────────────────┘

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39597: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39639: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40599: connect: connection refused"

go-skynet / helm-charts

[Help needed] CPU usage do not decrease after a request is completed #24