ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.57k stars 9.24k forks source link

Bug: Vulkan backend not work on an Imagination GPU on RISC-V Platform #8437

Closed yli147 closed 1 week ago

yli147 commented 1 month ago

What happened?

Nothing output after "Vulkan0: PowerVR B-Series BXE-2-32 (PowerVR B-Series Vulkan Driver) | uma: 1 | fp16: 1 | warp size: 1" It is on a RISC-V board with an imagination igpu

Name and Version

./llama-cli --version version: 3369 (278d0e18) built with cc (Ubuntu 13.2.0-4ubuntu3-bb2) 13.2.0 for riscv64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

root@k1:~/liyong/llama.cpp/build/bin# ./llama-cli -m ../../../Phi-3-mini-4k-instruct-fp16.gguf -p "Hi you how are you" -n 50 -e -ngl 33 -t 4
Log start
main: build = 3369 (278d0e18)
main: built with cc (Ubuntu 13.2.0-4ubuntu3-bb2) 13.2.0 for riscv64-linux-gnu
main: seed  = 1720706611
llama_model_loader: loaded meta data with 23 key-value pairs and 195 tensors from ../../../Phi-3-mini-4k-instruct-fp16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi3
llama_model_loader: - kv   1:                               general.name str              = Phi3
llama_model_loader: - kv   2:                        phi3.context_length u32              = 4096
llama_model_loader: - kv   3:                      phi3.embedding_length u32              = 3072
llama_model_loader: - kv   4:                   phi3.feed_forward_length u32              = 8192
llama_model_loader: - kv   5:                           phi3.block_count u32              = 32
llama_model_loader: - kv   6:                  phi3.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi3.attention.head_count_kv u32              = 32
llama_model_loader: - kv   8:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                  phi3.rope.dimension_count u32              = 96
llama_model_loader: - kv  10:                          general.file_type u32              = 1
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32064]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32064]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 32000
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 32000
llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  130 tensors
llm_load_vocab: special tokens cache size = 323
llm_load_vocab: token to piece cache size = 0.1690 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = phi3
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32064
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 96
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 96
llm_load_print_meta: n_embd_head_v    = 96
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 3072
llm_load_print_meta: n_embd_v_gqa     = 3072
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = F16
llm_load_print_meta: model params     = 3.82 B
llm_load_print_meta: model size       = 7.12 GiB (16.00 BPW)
llm_load_print_meta: general.name     = Phi3
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_print_meta: EOT token        = 32007 '<|end|>'
llm_load_print_meta: max token length = 48
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: PowerVR B-Series BXE-2-32 (PowerVR B-Series Vulkan Driver) | uma: 1 | fp16: 1 | warp size: 1
yli147 commented 1 month ago

Then after adding build options -DGGML_VULKAN_DEBUG=1 -DGGML_VULKAN_VALIDATE=1

cmake -B build -DGGML_VULKAN=1 -DGGML_VULKAN_DEBUG=1 -DGGML_VULKAN_VALIDATE=1 cmake --build build --config Release -j8

I got the below logs

root@k1:~/liyong/llama.cpp/build/bin# ./llama-cli -m ../../../ggml-model-q4_0.gguf -p "Hi you how are you" -n 50 -e -ngl 33 -t 4 Log start main: build = 3369 (278d0e18) main: built with cc (Ubuntu 13.2.0-4ubuntu3-bb2) 13.2.0 for riscv64-linux-gnu main: seed = 1720773844 llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from ../../../ggml-model-q4_0.gguf (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = .. llama_model_loader: - kv 2: llama.context_length u32 = 2048 llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 llama_model_loader: - kv 4: llama.block_count u32 = 22 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 11: general.file_type u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32003] = ["", "", "", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32003] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32003] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - type f32: 45 tensors llama_model_loader: - type q4_0: 155 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 262 llm_load_vocab: token to piece cache size = 0.1684 MB llm_load_print_meta: format = GGUF V2 llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32003 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 22 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 4 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 5632 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 1.10 B llm_load_print_meta: model size = 606.54 MiB (4.63 BPW) llm_load_print_meta: general.name = .. llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 2 '' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: EOT token = 32002 '<|im_end|>' llm_load_print_meta: max token length = 48 ggml_vk_instance_init() ggml_vulkan: Validation layers enabled ggml_vulkan: Found 1 Vulkan devices: ggml_vk_print_gpu_info(0) Vulkan0: PowerVR B-Series BXE-2-32 (PowerVR B-Series Vulkan Driver) | uma: 1 | fp16: 1 | warp size: 1 ggml_vk_get_device(0) Initializing new vk_device ggml_vk_find_queue_family_index() ggml_vk_find_queue_family_index() ggml_vk_create_queue() ggml_vk_load_shaders(PowerVR B-Series BXE-2-32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_l, main, 3, 56, (128,128,1), specialization_constants, 1) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_m, main, 3, 56, (64,64,1), specialization_constants, 1) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_s, main, 3, 56, (32,32,1), specialization_constants, 1) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_l, main, 3, 56, (128,128,1), specialization_constants, 1) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_m, main, 3, 56, (64,64,1), specialization_constants, 1) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_s, main, 3, 56, (32,32,1), specialization_constants, 1) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) VUID-VkPipelineLayoutCreateInfo-descriptorType-03024(ERROR / SPEC): msgNum: -349439268 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-descriptorType-03024 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xeb2bfadc | vkCreatePipelineLayout(): max per-stage storage buffer bindings count (3) exceeds device maxPerStageDescriptorUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors with a descriptorType of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER and VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC accessible to any given shader stage across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxPerStageDescriptorUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-descriptorType-03024) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039(ERROR / SPEC): msgNum: 2004556686 - Validation Error: [ VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039 ] Object 0: handle = 0x2ababb9e50, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x777b1b8e | vkCreatePipelineLayout(): sum of storage buffer bindings among all stages (3) exceeds device maxDescriptorSetUpdateAfterBindStorageBuffers limit (0). The Vulkan spec states: The total number of descriptors of the type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER accessible across all shader stages and across all elements of pSetLayouts must be less than or equal to VkPhysicalDeviceDescriptorIndexingProperties::maxDescriptorSetUpdateAfterBindStorageBuffers (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkPipelineLayoutCreateInfo-pSetLayouts-03039) Objects: 1 [0] 0x2ababb9e50, type: 3, name: NULL

yli147 commented 1 month ago

vulkaninfo output as below 1.txt

yli147 commented 1 month ago

Then I changed to use the build option which removes the vulkan validation: cmake -B build -DGGML_VULKAN=1 -DGGML_VULKAN_DEBUG=1 cmake --build build --config Release -j8

And only offload 23 layers to the igpu, I got the below logs

root@k1:~/liyong/llama.cpp/build/bin# ./llama-cli -m ../../../Phi-3-mini-4k-instruct-q4.gguf -p "Hi you how are you" -n 50 -e -ngl 23 -t 4 Log start main: build = 3369 (278d0e18) main: built with cc (Ubuntu 13.2.0-4ubuntu3-bb2) 13.2.0 for riscv64-linux-gnu main: seed = 1720779322 llama_model_loader: loaded meta data with 24 key-value pairs and 195 tensors from ../../../Phi-3-mini-4k-instruct-q4.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = phi3 llama_model_loader: - kv 1: general.name str = Phi3 llama_model_loader: - kv 2: phi3.context_length u32 = 4096 llama_model_loader: - kv 3: phi3.embedding_length u32 = 3072 llama_model_loader: - kv 4: phi3.feed_forward_length u32 = 8192 llama_model_loader: - kv 5: phi3.block_count u32 = 32 llama_model_loader: - kv 6: phi3.attention.head_count u32 = 32 llama_model_loader: - kv 7: phi3.attention.head_count_kv u32 = 32 llama_model_loader: - kv 8: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 9: phi3.rope.dimension_count u32 = 96 llama_model_loader: - kv 10: general.file_type u32 = 15 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.pre str = default llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32064] = ["", "", "", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32064] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32064] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 32000 llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... llama_model_loader: - kv 23: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_K: 81 tensors llama_model_loader: - type q5_K: 32 tensors llama_model_loader: - type q6_K: 17 tensors llm_load_vocab: special tokens cache size = 323 llm_load_vocab: token to piece cache size = 0.1690 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = phi3 llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32064 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 3072 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_rot = 96 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 96 llm_load_print_meta: n_embd_head_v = 96 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 3072 llm_load_print_meta: n_embd_v_gqa = 3072 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 8192 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 3B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 3.82 B llm_load_print_meta: model size = 2.23 GiB (5.01 BPW) llm_load_print_meta: general.name = Phi3 llm_load_print_meta: BOS token = 1 llm_load_print_meta: EOS token = 32000 <|endoftext|> llm_load_print_meta: UNK token = 0 llm_load_print_meta: PAD token = 32000 <|endoftext|> llm_load_print_meta: LF token = 13 <0x0A> llm_load_print_meta: EOT token = 32007 <|end|> llm_load_print_meta: max token length = 48 ggml_vk_instance_init() ggml_vulkan: Found 1 Vulkan devices: ggml_vk_print_gpu_info(0) Vulkan0: PowerVR B-Series BXE-2-32 (PowerVR B-Series Vulkan Driver) | uma: 1 | fp16: 1 | warp size: 1 ggml_vk_get_device(0) Initializing new vk_device ggml_vk_find_queue_family_index() ggml_vk_find_queue_family_index() ggml_vk_create_queue() ggml_vk_load_shaders(PowerVR B-Series BXE-2-32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_l, main, 3, 56, (128,128,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_m, main, 3, 56, (64,64,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_s, main, 3, 56, (32,32,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_l, main, 3, 56, (128,128,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_m, main, 3, 56, (64,64,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_s, main, 3, 56, (32,32,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f32_f16_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_l, main, 3, 56, (128,128,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_m, main, 3, 56, (64,64,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_s, main, 3, 56, (32,32,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_f32_l, main, 3, 56, (128,128,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_f32_m, main, 3, 56, (64,64,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_f32_s, main, 3, 56, (32,32,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_f16_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_0_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_0_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_0_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_0_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_0_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_0_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_1_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_1_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_1_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_1_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_1_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_1_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_0_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_0_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_0_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_0_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_0_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_0_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_1_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_1_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_1_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_1_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_1_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_1_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q8_0_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q8_0_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q8_0_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q8_0_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q8_0_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q8_0_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q2_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q2_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q2_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q2_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q2_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q2_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q3_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q3_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q3_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q3_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q3_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q3_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q4_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q5_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q6_k_f32_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q6_k_f32_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q6_k_f32_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q6_k_f32_aligned_l, main, 3, 56, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q6_k_f32_aligned_m, main, 3, 56, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_q6_k_f32_aligned_s, main, 3, 56, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f32_l, main, 4, 52, (128,128,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f32_m, main, 4, 52, (64,64,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f32_s, main, 4, 52, (32,32,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_l, main, 4, 52, (128,128,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_m, main, 4, 52, (64,64,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_s, main, 4, 52, (32,32,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_f32_l, main, 4, 52, (128,128,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_f32_m, main, 4, 52, (64,64,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_f32_s, main, 4, 52, (32,32,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_f16_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_0_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_0_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_0_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_0_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_0_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_0_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_1_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_1_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_1_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_1_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_1_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_1_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_0_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_0_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_0_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_0_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_0_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_0_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_1_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_1_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_1_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_1_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_1_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_1_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q8_0_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q8_0_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q8_0_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q8_0_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q8_0_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q8_0_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q2_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q2_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q2_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q2_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q2_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q2_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q3_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q3_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q3_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q3_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q3_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q3_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q4_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q5_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q6_k_f32_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q6_k_f32_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q6_k_f32_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q6_k_f32_aligned_l, main, 4, 52, (128,128,1), specialization_constants, 128) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q6_k_f32_aligned_m, main, 4, 52, (64,64,1), specialization_constants, 64) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, matmul_id_q6_k_f32_aligned_s, main, 4, 52, (32,32,1), specialization_constants, 32) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_f32_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_f16_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q4_0_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q4_1_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q5_0_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q5_1_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q8_0_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q2_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q3_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q4_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q5_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q6_k_f32_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_f32_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_f16_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q4_0_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q4_1_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q5_0_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q5_1_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q8_0_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q2_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q3_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q4_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q5_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_q6_k_f16_f32, main, 3, 44, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_f32_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_f16_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q4_0_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q4_1_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q5_0_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q5_1_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q8_0_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q2_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q3_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q4_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q5_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_id_q6_k_f32, main, 4, 36, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, f32_to_f16, main, 2, 20, (4096,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q4_0, main, 2, 20, (4096,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q4_1, main, 2, 20, (4096,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q5_0, main, 2, 20, (4096,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q5_1, main, 2, 20, (4096,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q8_0, main, 2, 20, (4096,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q2_k, main, 2, 20, (16384,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q3_k, main, 2, 20, (16384,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q4_k, main, 2, 20, (8192,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q5_k, main, 2, 20, (16384,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, dequant_q6_k, main, 2, 20, (16384,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_f32, main, 3, 112, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_f16, main, 3, 112, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q4_0, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q4_1, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q5_0, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q5_1, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q8_0, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_f32_f32, main, 3, 112, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_f16_f32, main, 3, 112, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q4_0_f32, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q4_1_f32, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q5_0_f32, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q5_1_f32, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, get_rows_q8_0_f32, main, 3, 112, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, split_k_reduce, main, 2, 8, (256,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_p021_f16_f32, main, 3, 24, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_mat_vec_nc_f16_f32, main, 3, 28, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, norm_f32, main, 2, 16, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, rms_norm_f32, main, 2, 16, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, cpy_f32_f32, main, 2, 80, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, cpy_f32_f16, main, 2, 80, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, cpy_f16_f16, main, 2, 80, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, add_f32, main, 3, 112, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, mul_f32, main, 3, 112, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, div_f32, main, 3, 112, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, scale_f32, main, 2, 80, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, sqr_f32, main, 2, 80, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, clamp_f32, main, 2, 80, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, gelu_f32, main, 2, 16, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, silu_f32, main, 2, 16, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, relu_f32, main, 2, 16, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, diag_mask_inf_f32, main, 2, 12, (512,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, soft_max_f32, main, 3, 28, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, soft_max_f32_f16, main, 3, 28, (1,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, rope_norm_f32, main, 4, 44, (1,512,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, rope_norm_f16, main, 4, 44, (1,512,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, rope_neox_f32, main, 4, 44, (1,512,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, rope_neox_f16, main, 4, 44, (1,512,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, argsort_f32, main, 2, 12, (1024,1,1), specialization_constants, 1) ggml_vk_create_pipeline(PowerVR B-Series BXE-2-32, sum_rows_f32, main, 2, 16, (1,1,1), specialization_constants, 1) ggml_vk_create_queue() ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_backend_vk_buffer_type(0) ggml_vk_get_device(0) ggml_vk_get_device(0) llm_load_tensors: ggml ctx size = 0.20 MiB ggml_vk_get_device(0) ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(1618452480) ggml_vk_create_buffer(PowerVR B-Series BXE-2-32, 1618452480, { DeviceLocal }, { HostVisible | HostCoherent }) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb3ffe70) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb3fffe0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400150) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4002c0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400430) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4005a0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400710) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400880) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4009f0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400b60) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400cd0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400e40) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb400fb0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401120) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401290) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401400) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401570) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4016e0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401850) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4019c0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401b30) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401ca0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401e10) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb401f80) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4020f0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402260) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4023d0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402540) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4026b0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402820) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402990) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402b00) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402c70) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402de0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb402f50) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4030c0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403230) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4033a0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403510) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403680) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4037f0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403960) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403ad0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403c40) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403db0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb403f20) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404090) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404200) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404370) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4044e0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404650) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4047c0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404930) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404aa0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404c10) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404d80) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb404ef0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405060) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4051d0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405340) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4054b0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405620) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405790) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405900) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405a70) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405be0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405d50) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb405ec0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406030) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4061a0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406310) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406480) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4065f0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406760) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4068d0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406a40) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406bb0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406d20) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb406e90) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407000) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407170) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4072e0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407450) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4075c0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407730) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4078a0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407a10) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407b80) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407cf0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407e60) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb407fd0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408140) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4082b0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408420) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408590) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408700) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408870) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4089e0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408b50) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408cc0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408e30) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb408fa0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409110) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409280) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4093f0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409560) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4096d0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409840) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb4099b0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409b20) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409c90) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409e00) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb409f70) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40a0e0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40a250) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40a3c0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40a530) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40a6a0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40a810) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40a980) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40aaf0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40ac60) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40add0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40af40) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40b0b0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40b220) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40b390) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40b500) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40b670) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40b7e0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40b950) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40bac0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40bc30) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40bda0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40bf10) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40c080) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40c1f0) ggml_backend_vk_buffer_init_tensor(0x2acc374c90 (0x2ad74e7930), 0x2acb40c360) ggml_vk_get_device(0) llm_load_tensors: offloading 23 repeating layers to GPU llm_load_tensors: offloaded 23/33 layers to GPU llm_load_tensors: PowerVR B-Series BXE-2-32 buffer size = 1543.48 MiB llm_load_tensors: CPU buffer size = 2281.66 MiB ggml_backend_vk_buffer_set_tensor(0x2acc374c90, 0x2acb3ffe70, 0x3f68ae60e0, 0, 12288) ggml_vk_buffer_write(12288) ggml_vk_buffer_write_2d(12288, 1) ggml_vk_create_temporary_context() ggml_vk_ctx_begin(PowerVR B-Series BXE-2-32) ggml_vk_create_cmd_buffer() ggml_vk_buffer_write_2d_async(12288, 1) <...> ggml_vk_sync_buffers() ggml_vk_ctx_end(0x2acc1b6370, 1) ggml_vk_submit(1, 0x2acb291530) ............................ llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 ggml_backend_vk_init(0) ggml_vk_init(, 0) ggml_vk_get_device(0) ggml_vulkan memory: ggml_backend_vk_buffer_type_alloc_buffer(1157627904) ggml_vk_create_buffer(PowerVR B-Series BXE-2-32, 1157627904, { DeviceLocal }, { HostVisible | HostCoherent }) (16134) PVR:(Error): BridgePhysmemNewRamBackedPMR() failed (PVRSRV_ERROR_PMR_FAILED_TO_ALLOC_PAGES) in DevmemXAllocPhysical() [ :342 ] (16134) PVR:(Error): DevmemXAllocPhysical() failed (PVRSRV_ERROR_PMR_FAILED_TO_ALLOC_PAGES) in PVRSRVDevMemXAllocPhysical() [ :45 ] ggml_vulkan: Device memory allocation of size 1157627904 failed. ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory llama_kv_cache_init: failed to allocate buffer for kv cache llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache ggml_backend_vk_free(Vulkan0) ggml_vk_cleanup(Vulkan0) ggml_vk_graph_cleanup() <...> ggml_vk_queue_cleanup() ggml_vk_queue_cleanup() llama_init_from_gpt_params: error: failed to create context with model '../../../Phi-3-mini-4k-instruct-q4.gguf' ggml_vulkan memory: ggml_backend_vk_buffer_free_buffer() ~vk_buffer_struct(0x2ad74e8830, 1618452480) main: error: unable to load model

yli147 commented 1 month ago

It seems the issue is very similar to https://github.com/ggerganov/llama.cpp/issues/5441

yli147 commented 1 month ago

Any thoughts ? Thanks

github-actions[bot] commented 1 week ago

This issue was closed because it has been inactive for 14 days since being marked as stale.