'The type initializer for 'LLama.Native.NativeApi' threw an exception' in MAUI App

nappa0326 commented 12 months ago

I thought the problem was similar to #58, but I could not solve it with that issue's solution.

First, I added the following source to the constructor of the MAUI App's MainPage.xaml.cs created in the wizard.

string modelPath = "{model_path}"

// Load model
var parameters = new ModelParams(modelPath)
{
    ContextSize = 2048,
    Seed = 1337,
    GpuLayerCount = 5,
};

using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var ex = new InteractiveExecutor(context);
ChatSession session = new ChatSession(ex);

Then, execute this application, the following exception is occurred.

System.TypeInitializationException HResult=0x80131534 Message=The type initializer for 'LLama.Native.NativeApi' threw an exception. Source=LLamaSharp

RuntimeError: The native library cannot be found. It could be one of the following reasons:

No LLamaSharp backend was installed. Please search LLamaSharp.Backend and install one of them.

You are using a device with only CPU but installed cuda backend. Please install cpu backend instead.

The backend is not compatible with your system cuda environment. Please check and fix it. If the environment is expected not to be changed, then consider build llama.cpp from source or submit an issue to LLamaSharp.

One of the dependency of the native library is missed.

The environment is as follows.

Windows11 [Version 10.0.22621.2283] maui-windows 7.0.92/7.0.100 VS 17.7.34031.279 LLamaSharp 0.5.1 LLamaSharp.Backend.Cuda12 0.5.1

スクリーンショット 2023-09-23 140214

Note that the above source works fine when implement into a console application created by VS2022 wizard in the same environment. The following is the output of that application.

_ggml_init_cublas: found Device 0: NVIDIA GeForce llama_model_loader: loaded llama_model_loader: - tensor 0: llama_model_loader: - tensor 1: llama_model_loader: - tensor 2: llama_model_loader: - tensor 3: llama_model_loader: - tensor 4: llama_model_loader: - tensor 5: llama_model_loader: - tensor 6: llama_model_loader: - tensor 7: llama_model_loader: - tensor 8: llama_model_loader: - tensor 9: llama_model_loader: - tensor 10: llama_model_loader: - tensor 11: llama_model_loader: - tensor 12: llama_model_loader: - tensor 13: llama_model_loader: - tensor 14: llama_model_loader: - tensor 15: llama_model_loader: - tensor 16: llama_model_loader: - tensor 17: llama_model_loader: - tensor 18: llama_model_loader: - tensor 19: llama_model_loader: - tensor 20: llama_model_loader: - tensor 21: llama_model_loader: - tensor 22: llama_model_loader: - tensor 23: llama_model_loader: - tensor 24: llama_model_loader: - tensor 25: llama_model_loader: - tensor 26: llama_model_loader: - tensor 27: llama_model_loader: - tensor 28: llama_model_loader: - tensor 29: llama_model_loader: - tensor 30: llama_model_loader: - tensor 31: llama_model_loader: - tensor 32: llama_model_loader: - tensor 33: llama_model_loader: - tensor 34: llama_model_loader: - tensor 35: llama_model_loader: - tensor 36: llama_model_loader: - tensor 37: llama_model_loader: - tensor 38: llama_model_loader: - tensor 39: llama_model_loader: - tensor 40: llama_model_loader: - tensor 41: llama_model_loader: - tensor 42: llama_model_loader: - tensor 43: llama_model_loader: - tensor 44: llama_model_loader: - tensor 45: llama_model_loader: - tensor 46: llama_model_loader: - tensor 47: llama_model_loader: - tensor 48: llama_model_loader: - tensor 49: llama_model_loader: - tensor 50: llama_model_loader: - tensor 51: llama_model_loader: - tensor 52: llama_model_loader: - tensor 53: llama_model_loader: - tensor 54: llama_model_loader: - tensor 55: llama_model_loader: - tensor 56: llama_model_loader: - tensor 57: llama_model_loader: - tensor 58: llama_model_loader: - tensor 59: llama_model_loader: - tensor 60: llama_model_loader: - tensor 61: llama_model_loader: - tensor 62: llama_model_loader: - tensor 63: llama_model_loader: - tensor 64: llama_model_loader: - tensor 65: llama_model_loader: - tensor 66: llama_model_loader: - tensor 67: llama_model_loader: - tensor 68: llama_model_loader: - tensor 69: llama_model_loader: - tensor 70: llama_model_loader: - tensor 71: llama_model_loader: - tensor 72: llama_model_loader: - tensor 73: llama_model_loader: - tensor 74: llama_model_loader: - tensor 75: llama_model_loader: - tensor 76: llama_model_loader: - tensor 77: llama_model_loader: - tensor 78: llama_model_loader: - tensor 79: llama_model_loader: - tensor 80: llama_model_loader: - tensor 81: llama_model_loader: - tensor 82: llama_model_loader: - tensor 83: llama_model_loader: - tensor 84: llama_model_loader: - tensor 85: llama_model_loader: - tensor 86: llama_model_loader: - tensor 87: llama_model_loader: - tensor 88: llama_model_loader: - tensor 89: llama_model_loader: - tensor 90: llama_model_loader: - tensor 91: llama_model_loader: - tensor 92: llama_model_loader: - tensor 93: llama_model_loader: - tensor 94: llama_model_loader: - tensor 95: llama_model_loader: - tensor 96: llama_model_loader: - tensor 97: llama_model_loader: - tensor 98: llama_model_loader: - tensor 99: llama_model_loader: - tensor 100: llama_model_loader: - tensor 101: llama_model_loader: - tensor 102: llama_model_loader: - tensor 103: llama_model_loader: - tensor 104: llama_model_loader: - tensor 105: llama_model_loader: - tensor 106: llama_model_loader: - tensor 107: llama_model_loader: - tensor 108: llama_model_loader: - tensor 109: llama_model_loader: - tensor 110: llama_model_loader: - tensor 111: llama_model_loader: - tensor 112: llama_model_loader: - tensor 113: llama_model_loader: - tensor 114: llama_model_loader: - tensor 115: llama_model_loader: - tensor 116: llama_model_loader: - tensor 117: llama_model_loader: - tensor 118: llama_model_loader: - tensor 119: llama_model_loader: - tensor 120: llama_model_loader: - tensor 121: llama_model_loader: - tensor 122: llama_model_loader: - tensor 123: llama_model_loader: - tensor 124: llama_model_loader: - tensor 125: llama_model_loader: - tensor 126: llama_model_loader: - tensor 127: llama_model_loader: - tensor 128: llama_model_loader: - tensor 129: llama_model_loader: - tensor 130: llama_model_loader: - tensor 131: llama_model_loader: - tensor 132: llama_model_loader: - tensor 133: llama_model_loader: - tensor 134: llama_model_loader: - tensor 135: llama_model_loader: - tensor 136: llama_model_loader: - tensor 137: llama_model_loader: - tensor 138: llama_model_loader: - tensor 139: llama_model_loader: - tensor 140: llama_model_loader: - tensor 141: llama_model_loader: - tensor 142: llama_model_loader: - tensor 143: llama_model_loader: - tensor 144: llama_model_loader: - tensor 145: llama_model_loader: - tensor 146: llama_model_loader: - tensor 147: llama_model_loader: - tensor 148: llama_model_loader: - tensor 149: llama_model_loader: - tensor 150: llama_model_loader: - tensor 151: llama_model_loader: - tensor 152: llama_model_loader: - tensor 153: llama_model_loader: - tensor 154: llama_model_loader: - tensor 155: llama_model_loader: - tensor 156: llama_model_loader: - tensor 157: llama_model_loader: - tensor 158: llama_model_loader: - tensor 159: llama_model_loader: - tensor 160: llama_model_loader: - tensor 161: llama_model_loader: - tensor 162: llama_model_loader: - tensor 163: llama_model_loader: - tensor 164: llama_model_loader: - tensor 165: llama_model_loader: - tensor 166: llama_model_loader: - tensor 167: llama_model_loader: - tensor 168: llama_model_loader: - tensor 169: llama_model_loader: - tensor 170: llama_model_loader: - tensor 171: llama_model_loader: - tensor 172: llama_model_loader: - tensor 173: llama_model_loader: - tensor 174: llama_model_loader: - tensor 175: llama_model_loader: - tensor 176: llama_model_loader: - tensor 177: llama_model_loader: - tensor 178: llama_model_loader: - tensor 179: llama_model_loader: - tensor 180: llama_model_loader: - tensor 181: llama_model_loader: - tensor 182: llama_model_loader: - tensor 183: llama_model_loader: - tensor 184: llama_model_loader: - tensor 185: llama_model_loader: - tensor 186: llama_model_loader: - tensor 187: llama_model_loader: - tensor 188: llama_model_loader: - tensor 189: llama_model_loader: - tensor 190: llama_model_loader: - tensor 191: llama_model_loader: - tensor 192: llama_model_loader: - tensor 193: llama_model_loader: - tensor 194: llama_model_loader: - tensor 195: llama_model_loader: - tensor 196: llama_model_loader: - tensor 197: llama_model_loader: - tensor 198: llama_model_loader: - tensor 199: llama_model_loader: - tensor 200: llama_model_loader: - tensor 201: llama_model_loader: - tensor 202: llama_model_loader: - tensor 203: llama_model_loader: - tensor 204: llama_model_loader: - tensor 205: llama_model_loader: - tensor 206: llama_model_loader: - tensor 207: llama_model_loader: - tensor 208: llama_model_loader: - tensor 209: llama_model_loader: - tensor 210: llama_model_loader: - tensor 211: llama_model_loader: - tensor 212: llama_model_loader: - tensor 213: llama_model_loader: - tensor 214: llama_model_loader: - tensor 215: llama_model_loader: - tensor 216: llama_model_loader: - tensor 217: llama_model_loader: - tensor 218: llama_model_loader: - tensor 219: llama_model_loader: - tensor 220: llama_model_loader: - tensor 221: llama_model_loader: - tensor 222: llama_model_loader: - tensor 223: llama_model_loader: - tensor 224: llama_model_loader: - tensor 225: llama_model_loader: - tensor 226: llama_model_loader: - tensor 227: llama_model_loader: - tensor 228: llama_model_loader: - tensor 229: llama_model_loader: - tensor 230: llama_model_loader: - tensor 231: llama_model_loader: - tensor 232: llama_model_loader: - tensor 233: llama_model_loader: - tensor 234: llama_model_loader: - tensor 235: llama_model_loader: - tensor 236: llama_model_loader: - tensor 237: llama_model_loader: - tensor 238: llama_model_loader: - tensor 239: llama_model_loader: - tensor 240: llama_model_loader: - tensor 241: llama_model_loader: - tensor 242: llama_model_loader: - tensor 243: llama_model_loader: - tensor 244: llama_model_loader: - tensor 245: llama_model_loader: - tensor 246: llama_model_loader: - tensor 247: llama_model_loader: - tensor 248: llama_model_loader: - tensor 249: llama_model_loader: - tensor 250: llama_model_loader: - tensor 251: llama_model_loader: - tensor 252: llama_model_loader: - tensor 253: llama_model_loader: - tensor 254: llama_model_loader: - tensor 255: llama_model_loader: - tensor 256: llama_model_loader: - tensor 257: llama_model_loader: - tensor 258: llama_model_loader: - tensor 259: llama_model_loader: - tensor 260: llama_model_loader: - tensor 261: llama_model_loader: - tensor 262: llama_model_loader: - tensor 263: llama_model_loader: - tensor 264: llama_model_loader: - tensor 265: llama_model_loader: - tensor 266: llama_model_loader: - tensor 267: llama_model_loader: - tensor 268: llama_model_loader: - tensor 269: llama_model_loader: - tensor 270: llama_model_loader: - tensor 271: llama_model_loader: - tensor 272: llama_model_loader: - tensor 273: llama_model_loader: - tensor 274: llama_model_loader: - tensor 275: llama_model_loader: - tensor 276: llama_model_loader: - tensor 277: llama_model_loader: - tensor 278: llama_model_loader: - tensor 279: llama_model_loader: - tensor 280: llama_model_loader: - tensor 281: llama_model_loader: - tensor 282: llama_model_loader: - tensor 283: llama_model_loader: - tensor 284: llama_model_loader: - tensor 285: llama_model_loader: - tensor 286: llama_model_loader: - tensor 287: llama_model_loader: - tensor 288: llama_model_loader: - tensor 289: llama_model_loader: - tensor 290: llama_model_loader: - kv 0: llama_model_loader: - kv 1: llama_model_loader: - kv 2: llama_model_loader: - kv 3: llama_model_loader: - kv 4: llama_model_loader: - kv 5: llama_model_loader: - kv 6: llama_model_loader: - kv 7: llama_model_loader: - kv 8: llama_model_loader: - kv 9: llama_model_loader: - kv 10: llama_model_loader: - kv 11: llama_model_loader: - kv 12: llama_model_loader: - kv 13: llama_model_loader: - kv 14: llama_model_loader: - kv 15: llama_model_loader: - kv 16: llama_model_loader: - kv 17: llama_model_loader: - kv 18: llama_model_loader: - type f32: llama_model_loader: - type q5_K: llama_model_loader: - type q6_K: llm_load_print_meta: format llm_load_print_meta: arch llm_load_print_meta: vocab type llm_load_print_meta: n_vocab llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train llm_load_print_meta: n_ctx llm_load_print_meta: n_embd llm_load_print_meta: n_head llm_load_print_meta: n_head_kv llm_load_print_meta: n_layer llm_load_print_meta: n_rot llm_load_print_meta: n_gqa = 1 llm_load_print_meta: f_norm_eps llm_load_print_meta: f_norm_rms_eps llm_load_print_meta: n_ff llm_load_print_meta: freq_base llm_load_print_meta: freq_scale = 1 llm_load_print_meta: model type llm_load_print_meta: model ftype llm_load_print_meta: model size llm_load_print_meta: general.name llm_load_print_meta: BOS llm_load_print_meta: EOS llm_load_print_meta: UNK llm_load_print_meta: LF token llm_load_tensors: ggml ctx size = llm_load_tensors: using CUDA llm_load_tensors: mem required llm_load_tensors: offloading llm_load_tensors: offloaded llm_load_tensors: VRAM used: 703 MB .................................. llama_new_context_with_model: kv self size llama_new_context_with_model: llama_new_context_withmodel: 1 CUDA devices: RTX 3060 Laptop GPU, compute capability 8.6 meta data with 19 key-value pairs and 291 tensors from C:\Users\nappa\AppData\Local\Packages\91f0e9ee-d8fd-40e7-b386-4f48ff88f821_jmyq9vndw01x8\LocalState\llama-2-7b-guanaco-qlora.Q5_K_M.gguf (version GGUF V2 (latest)) token_embd.weight q5_K [ 4096, 32000, 1, 1 ] blk.0.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.0.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.0.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.0.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.0.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.0.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.0.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.1.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.1.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.1.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.1.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.1.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.1.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.1.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.2.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.2.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.2.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.2.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.2.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.2.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.2.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.3.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.3.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.3.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.3.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.3.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.3.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.3.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.4.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.4.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.4.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.4.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.4.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.4.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.4.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.5.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.5.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.5.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.5.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.5.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.5.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.5.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.6.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.6.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.6.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.6.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.6.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.6.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.6.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.7.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.7.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.7.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.7.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.7.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.7.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.7.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.8.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.8.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.8.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.8.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.8.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.8.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.8.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.9.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.9.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.9.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.9.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.9.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.9.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.9.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.10.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.10.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.10.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.10.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.10.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.10.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.10.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.11.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.11.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.11.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.11.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.11.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.11.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.11.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.12.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.12.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.12.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.12.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.12.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.12.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.12.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.13.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.13.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.13.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.13.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.13.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.13.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.13.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.14.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.14.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.14.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.14.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.14.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.14.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.14.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.15.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.15.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.15.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.15.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.15.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.15.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.15.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.16.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.16.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.16.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.16.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.16.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.16.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.16.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.17.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.17.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.17.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.17.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.17.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.17.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.17.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.18.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.18.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.18.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.18.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.18.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.18.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.18.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.19.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.19.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.19.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.19.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.19.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.19.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.19.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.20.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.20.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.20.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.20.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.20.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.20.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.20.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.21.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.21.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.21.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.21.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.21.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.21.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.21.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.22.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.22.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.22.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.22.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.22.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.22.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.22.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.23.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.23.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.23.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.23.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.23.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.23.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.23.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.24.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.24.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.24.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.24.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.24.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.24.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.24.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.25.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.25.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.25.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.25.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.25.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.25.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.25.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.26.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.26.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.26.attn_v.weight q5_K [ 4096, 4096, 1, 1 ] blk.26.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.26.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.26.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.26.ffn_down.weight q5_K [ 11008, 4096, 1, 1 ] blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.27.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.27.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.27.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.27.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.27.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.27.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.27.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.28.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.28.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.28.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.28.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.28.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.28.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.28.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.29.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.29.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.29.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.29.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.29.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.29.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.29.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.30.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.30.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.30.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.30.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.30.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.30.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.30.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.31.attn_q.weight q5_K [ 4096, 4096, 1, 1 ] blk.31.attn_k.weight q5_K [ 4096, 4096, 1, 1 ] blk.31.attn_v.weight q6_K [ 4096, 4096, 1, 1 ] blk.31.attn_output.weight q5_K [ 4096, 4096, 1, 1 ] blk.31.ffn_gate.weight q5_K [ 4096, 11008, 1, 1 ] blk.31.ffn_up.weight q5_K [ 4096, 11008, 1, 1 ] blk.31.ffn_down.weight q6_K [ 11008, 4096, 1, 1 ] blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ] blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] output_norm.weight f32 [ 4096, 1, 1, 1 ] output.weight q6_K [ 4096, 32000, 1, 1 ] general.architecture str general.name str llama.context_length u32 llama.embedding_length u32 llama.block_count u32 llama.feed_forward_length u32 llama.rope.dimension_count u32 llama.attention.head_count u32 llama.attention.head_count_kv u32 llama.attention.layer_norm_rms_epsilon f32 general.file_type u32 tokenizer.ggml.model str tokenizer.ggml.tokens arr tokenizer.ggml.scores arr tokenizer.ggml.token_type arr tokenizer.ggml.bos_token_id u32 tokenizer.ggml.eos_token_id u32 tokenizer.ggml.unknown_token_id u32 general.quantization_version u32 65 tensors 193 tensors 33 tensors = GGUF V2 (latest) = llama = SPM = 32000 = 2048 = 2048 = 4096 = 32 = 32 = 32 = 128 = 1.0e-05 = 1.0e-05 = 11008 = 10000.0 = 7B = mostly Q5_K - Medium = 6.74 B = mikael110_llama-2-7b-guanaco-fp16 token = 1 '~~' token = 2 '~~' token = 0 '' = 13 '<0x0A>' 0.09 MB for GPU acceleration = 3858.18 MB (+ 1024.00 MB per state) 5 repeating layers to GPU 5/35 layers to GPU ................................................................ = 1024.00 MB compute buffer total size = 153.41 MB VRAM scratch buffer: 152.00 MB

AsakusaRinne commented 11 months ago

I'm not familiar with MAUI but I'd like to help you with it. Could you please search the libllama file in your bin folder of MAUI? (I assume that there's a folder containing all things necessary for running MAUI app)

nappa0326 commented 11 months ago

@AsakusaRinne I found the 'libllama.dll' file in the bin folder ('bin\Debug\net7.0-windows10.0.19041.0\win10-x64') when using either 'LLamaSharp.Backend.Cuda12' or 'LLamaSharp.Backend.Cpu'.

The exceptions that occurred were the same in both cases.

Thank you.

AsakusaRinne commented 10 months ago

Recently it's a common issue that using backend packages on machine without avx2 support will have this error. Could you please check if your PC support avx2?

lcarrere commented 9 months ago

I can confirm that running CUDA version on MAUI, so on Windows operating system, can't work with the current implementation. As MAUI apps run in a sandbox the native libraries search will not honor the system Path environment variable. Therefore llama.cpp will fail to be initialized, as it requires resolution of the CUDA SDK.

To me there are 2 options:

1/

Developer must package CUDA dependencies in their app. See: https://docs.nvidia.com/cuda/eula/index.html#attachment-a

2/

From within LLama Sharp:

replace:

if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
{
    cudaPath = Environment.GetEnvironmentVariable("CUDA_PATH");
    if (cudaPath is null)
    {
        return -1;
    }
    version = GetCudaVersionFromPath(cudaPath);
}

by

if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
{
    cudaPath = Environment.GetEnvironmentVariable("CUDA_PATH");
    if (cudaPath is null)
    {
        return -1;
    }

   //Ensuring cuda bin path is reachable. Especially for MAUI environment.
   string cudaBinPath = Path.Combine(cudaPath, "bin");

   if (Directory.Exists(cudaBinPath))
   {
       AddDllDirectory(cudaBinPath);
   }

    version = GetCudaVersionFromPath(cudaPath);
}

The AddDllDirectory definition:

    internal static class LocalGGUFRuntime
    {
        [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
        private static extern int AddDllDirectory(string NewDirectory);

AsakusaRinne commented 9 months ago

@lcarrere In v0.8.0 we added cpu and cuda feature detections, along with an API to let you specify the path of library yourself. As you mentioned, the MAUI app may have different behaviors of native library searching, could you please use NativeLibraryConfig.Default.WithLibrary({YOUR_PATH}) to specify the path of the native library you want to load?

Note that the code above should be added before any llama model is initialized. We strongly recommend to put it at the very beginning of your code.

if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
{
    cudaPath = Environment.GetEnvironmentVariable("CUDA_PATH");
    if (cudaPath is null)
    {
        return -1;
    }

   //Ensuring cuda bin path is reachable. Especially for MAUI environment.
   string cudaBinPath = Path.Combine(cudaPath, "bin");

   if (Directory.Exists(cudaBinPath))
   {
       AddDllDirectory(cudaBinPath);
   }

    version = GetCudaVersionFromPath(cudaPath);
}

If it works for you, we'd like to add it to LLamaSharp. :)

lcarrere commented 9 months ago

@AsakusaRinne NativeLibraryConfig.Instance.WithLibrary() is working. Though it's not mandatory to use it to get MAUI working with automatic backend loading + CUDA runtime loading, with the patch I suggested.

To be clear: adding my patch, makes MAUI working using the default backend loading. It is only intended to allow the current process to resolves native DLLs of the CUDA runtime.

Let me know if i am not clear enough.

AsakusaRinne commented 9 months ago

@AsakusaRinne NativeLibraryConfig.Instance.WithLibrary() is working. Though it's not mandatory to use it to get MAUI working with automatic backend loading + CUDA runtime loading, with the patch I suggested.

To be clear: adding my patch, makes MAUI working using the default backend loading. It is only intended to allow the current process to resolves native DLLs of the CUDA runtime.

Let me know if i am not clear enough.

Thank you for your feedback. There're the following questions confusing me:

If a user has had cuda installed on windows and is able to run cuda based code, is MAUI still possible not able to take take use of cuda unless some specific settings?
When you publish your MAUI app, is it acceptable to contain more than one native libraries? And when the user install the app, is it possible to put the files into a specified path?
As a MAUI developer, what is the best way for you to develop and publish a MAUI app? (you're not supposed to take feasibility into account)

In fact I've long been planning to add an integration or example for MAUI because it's the most popular cross-platform UI in .NET. It has been on the TODO list of this repo since the beginning. However I don't have much knowledge about it and time was limited for me in the past 3 months so that I didn't even start it. At least, I'll do something to make it more convenient for MAUI developers to use LLamaSharp. :)

lcarrere commented 9 months ago

If a user has had cuda installed on windows and is able to run cuda based code, is MAUI still possible not able to take take use of cuda unless some specific settings?

It depends on the deployment strategy.

1- If the application is deployed expecting the CUDA Toolkit being installed on the platform: Developer have to ensure that CUDA runtime path is included into the set of directories that are searched by the Windows native library lookup mechanism. As MAUI apps run in a sandbox mode, ones defined by the PATH environment variable will be ignored. Since the default llama.cpp runtime loading strategy of LlamaSharp is based on the system configuration, I would say LlamaSharp endorses the "responsibility" to ensure the correct CUDA runtime is reachable.

2- If the application is deployed by including all required runtimes: It's up to the application developer to guaranty the availability of these runtimes into the application bin path. This is trickier and probably implies legal constraints. By allowing custom runtime loading in latest code base, I would say you've opened the pandora box :) Because in this mode, you can not determine, from llamaSharp, which CUDA runtime has been used to compile the llama.cpp backend. So technically, you should also allow the calling app, for Windows, to specify the CUDA runtime path that the backend is compatible with.

When you publish your MAUI app, is it acceptable to contain more than one native libraries?

Sure.

And when the user install the app, is it possible to put the files into a specified path?

Yes.

As a MAUI developer, what is the best way for you to develop and publish a MAUI app? (you're not supposed to take feasibility into account)

I have no much expertise on MAUI. But I would say that developer expectations would be to be able to consume components without having to implement any platform specific logic. I would consider LLamaSharp as a reusable component, compatible with MacOS and Windows.

That being said, I see 2 major deployment models:

1- One implying the constraint that for running on Windows OS, the CUDA runtime must be installed on the system. This is the scenario LLamaSharp currently fails to address, but easily resolvable with the approach I'm suggesting.

2- One implying that all runtime are managed/shipped within the deployed app. Just a guess: I believe the CUDA runtime must be shipped within the application package, which runtime must be compatible with llama.cpp backend consumed by LLamaSharp. -> Therefore using the new .WithLibrary() method.

And for debugging and prototyping purposes, I would say the first scenario is mandatory to cover.

So, to sum up: LLamaSharp implements an automatic backend loading mechanism based on the OS configuration (referring to the CUDA_PATH environment variable you're parsing). For it to consistently work within MAUI on Windows, it should also guarantee that the BIN files located within this retrieved path are reachable by the loadLibrary* Windows API, which will be used under the hood during the llama.cpp backend loading.

Let me know if you need further clarification or any test from my end.

evolcano commented 6 months ago

@AsakusaRinne Yes! It works in MAUI of my windows 11! Please add it to LLamaSharp.

Thanks you , @lcarrere

AsakusaRinne commented 6 months ago

@evolcano Glad to hear that LLamaSharp works well with MAUI! I'm sorry that I'm not familiar with MAUI and don't have much time to study it yet. I won't reject a PR if you'd like to add an integration or an example for it. :)

AmSmart commented 4 months ago

@AsakusaRinne Hasn't this issue been fixed by #631. I think it can be closed now

SciSharp / LLamaSharp

'The type initializer for 'LLama.Native.NativeApi' threw an exception' in MAUI App #180