mikeyang01 commented 1 year ago

Expected Behavior

Use GPU instead of CPU

Current Behavior

during the prompt, CPU usage is high, GPU low usage, not change much https://github.com/mikeyang01/ImageStorage/blob/main/2023-06-30%20132414.png

Environment and Context

Physical (or virtual) hardware you are using, e.g. for Linux:

AMD Ryzen™ 7 7735HS GPU is embeded in CPU Windows 11 pro

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

step 1 Download latest version https://github.com/LostRuins/koboldcpp/releases/tag/v1.33 step 2 run .\koboldcpp.exe --useclblast 0 0 --gpulayers 24 --threads 10 step 3 asking questions though localhost

Failure Logs

PS C:\Users\yy\Downloads> .\koboldcpp.exe --useclblast 0 0 --gpulayers 24 --threads 10
Welcome to KoboldCpp - Version 1.33
For command line arguments, please refer to --help
Otherwise, please manually select ggml file:
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.dll
==========
Loading model: C:\Github\models\wizard-vicuna-13B.ggmlv3.q4_0.bin
[Threads: 10, BlasThreads: 10, SmartContext: False]

---
Identified as LLAMA model: (ver 5)
Attempting to Load...
---
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
llama.cpp: loading model from C:\Github\models\wizard-vicuna-13B.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB

Platform:0 Device:0  - AMD Accelerated Parallel Processing with gfx1035

ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx1035'
ggml_opencl: device FP16 support: true
CL FP16 temporarily disabled pending further optimization.
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 4947.02 MB (+ 1608.00 MB per state)
llama_model_load_internal: offloading 24 repeating layers to GPU
llama_model_load_internal: offloaded 24/43 layers to GPU
llama_model_load_internal: total VRAM used: 4085 MB
llama_new_context_with_model: kv self size  = 1600.00 MB
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001

Input: {"n": 1, "max_context_length": 1024, "max_length": 80, "rep_pen": 1.08, "temperature": 0.7, "top_p": 0.92, "top_k": 0, "top_a": 0, "typical": 1, "tfs": 1, "rep_pen_range": 256, "rep_pen_slope": 0.7, "sampler_order": [6, 0, 1, 2, 3, 4, 5], "prompt": "hi there,\nI'm planning on writing a novel and I need some help with the plot.\nHere is what I have so far:\n1. The story takes place in 2055 where robots are common and society has fully integrated them into their daily lives.\n2.who are you?\n3.The protagonist is a young woman who discovers that she has the ability to control robots with her mind.\n4. She teams up with an underground resistance group that fights against a government that uses robots as weapons of war.\n5.hi, could you help me with my plot?\n    I'd be happy to help! What specifically do you need help with? Do you have any questions about your characters or the world they live in? Or perhaps you're looking for guidance on how to structure your story or create conflict?} \n6.hi, I need some help with my plot. Specifically, I want to know if it's possible for my protagonist to control robots using her mind without having any prior training or knowledge in programming or robotics? Is there any way she could learn quickly enough to use this ability effectively in battle?who are you? \n7.my name is emily, nice to meet you. In terms of your question, it's definitely possible for your protagonist to control robots using her mind without any prior training or knowledge in programming or robotics. However, it would be unlikely that she would be able to learn quickly enough to use this ability effectively in battle.how are you doing today? Could you please explain why? \n8.thank you for your help so far. Just to clarify, are you saying that it's not realistic for my protagonist to be able to learn how to control robots with her mind in a short amount of time, even with the help of an underground resistance group who specializes in robotics and programming?who are you", "quiet": true}

Processing Prompt [BLAS] (411 / 411 tokens)
Generating (80 / 80 tokens)
Time Taken - Processing:23.4s (57ms/T), Generation:33.7s (421ms/T), Total:57.1s (1.4T/s)
Output: ?
9.yes, that's exactly what I'm saying. While it's certainly possible for someone to develop psychic abilities, it's highly unlikely that they would be able to master such a complex skill set in a matter of weeks or months. Additionally, even if your protagonist were able to learn quickly, it's unlikely that she would be able to

LostRuins commented 1 year ago

Doesn't look like a bug. Integrated GPU's aren't very fast

mikeyang01 commented 1 year ago

Doesn't look like a bug. Integrated GPU's aren't very fast

thanks, not a bug, just a question, i saw using clblast causing a high CPU usage, GPU usage not change, I think, Clblast shoud use GPU....? Sorry, i move this to discussion channel

FilJed commented 1 year ago

Doesn't look like a bug. Integrated GPU's aren't very fast

OP's GPU is 680m which led play cuberpunk2077 (I know it's weak point though). I do also have problem using CLBlast on ryzen 6600H with 660m iGPU. It repeatedly empted VRAM and no GPU usage was observed either. It does detect GPU as gfx1035, but that is it. I found, 660m wasn't tuned in CLBlast, but 680m was. Taking into account that 660m is cut 680m, I hoped it all would work well. But it did not. Can you advice please?

LostRuins commented 1 year ago

Are you sure you selected the correct GPU?

FilJed commented 1 year ago

I do. Here you can see, before any inference, how VRAM usage dropped: Screenshot 2023-07-11 221714

LostRuins commented 1 year ago

Your device only has 512 mb of dedicated vram, that is not enough. Shard VRAM will not be any faster than normal RAM.

FilJed commented 1 year ago

Totally agreed. But correct me if I'm wrong, gfx1035 can allocate up to 8gb of VRAM (it tries, but without success) and the speed of RAM does not such matter as the computing units themselves. I've been expecting to see perfomance increased if GPU units would do their job, but I don't see they do. It simply did not even start using GPU to compare performance. Anyways, big thanks for assisting. Let me know if you know what else I could try.

LostRuins / koboldcpp

Question: using CLBlast, AMD GPU usage is very low #278

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs