go-skynet / helm-charts

go-skynet helm chart repository
52 stars 38 forks source link

[Help needed] CPU usage do not decrease after a request is completed #24

Open 3deep5me opened 11 months ago

3deep5me commented 11 months ago

Does someone else also has the problem that after chat request the cpu load do not decrease?

I'm using CodeLlama-34B-Instruct-GGUF and the ChatGPT-Next-Web-UI.

With other bindings i do not have this problem e.g. ialacol.

Logs are looking normal:


Defaulted container "test-local-ai" out of: test-local-ai, download-model (init)
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
model name      : AMD EPYC-Milan Processor
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat umip pku ospke rdpid fsrm
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
@@@@@
2:33AM INF Starting LocalAI using 24 threads, with models path: /models
2:33AM INF LocalAI version: v1.30.0 (274ace289823a8bacb7b4987b5c961b62d5eee99)

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.49.2                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 70  Processes ........... 1 │
 │ Prefork ....... Disabled  PID ................ 14 │
 └───────────────────────────────────────────────────┘

rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39597: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:39639: connect: connection refused"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:40599: connect: connection refused"
jamiemoller commented 8 months ago

@3deep5me did you end up resolving this issue? I found I had this error when running with an incompatible cuda version (i was accidentally running the C11 container with C12 on the host)