Closed FlareP1 closed 11 months ago
FYI here is the log from a successful llama.cpp run
Log start
main: build = 1610 (23b5e12)
main: built with MSVC 19.37.32826.1 for x64
main: seed = 1701721540
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
Device 1: NVIDIA GeForce RTX 3080, compute capability 8.6
llama_model_loader: loaded meta data with 20 key-value pairs and 1043 tensors from D:/GGML_Models/72b-q4_k_m.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor 0: blk.25.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1: blk.25.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 2: blk.25.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 3: blk.25.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 4: blk.25.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 5: blk.25.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 6: blk.25.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 7: blk.25.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 8: blk.25.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.25.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 10: blk.25.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 11: blk.25.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 12: blk.26.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 13: blk.41.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 14: blk.41.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 15: blk.41.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 16: blk.41.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 17: blk.41.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 18: blk.41.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 19: blk.41.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 20: blk.41.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 21: blk.41.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 22: blk.41.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 23: blk.41.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 24: blk.41.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 25: blk.42.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 26: blk.5.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 27: blk.5.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.5.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 29: blk.5.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 30: blk.5.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 31: blk.5.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 32: blk.5.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 33: blk.5.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 34: blk.5.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 35: blk.5.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 36: blk.5.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 37: blk.5.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 38: blk.6.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 39: token_embd.weight q4_K [ 8192, 152064, 1, 1 ]
llama_model_loader: - tensor 40: blk.10.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 41: blk.10.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 42: blk.10.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 43: blk.10.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 44: blk.10.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 45: blk.10.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 46: blk.10.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 47: blk.10.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 48: blk.10.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 49: blk.10.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 50: blk.10.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 51: blk.10.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 52: blk.11.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 53: blk.40.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 54: blk.40.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 55: blk.40.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 56: blk.40.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 57: blk.40.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 58: blk.40.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 59: blk.40.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 60: blk.40.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 61: blk.40.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 62: blk.40.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 63: blk.40.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 64: blk.40.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 65: blk.41.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 66: blk.77.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 67: blk.77.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 68: blk.77.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 69: blk.77.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 70: blk.77.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 71: blk.77.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 72: blk.77.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 73: blk.77.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 74: blk.77.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 75: blk.77.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 76: blk.77.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 77: blk.77.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 78: blk.78.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 79: blk.35.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 80: blk.35.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 81: blk.35.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.35.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 83: blk.35.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 84: blk.35.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 85: blk.35.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 86: blk.35.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 87: blk.35.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 88: blk.35.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 89: blk.35.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 90: blk.35.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 91: blk.36.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 92: blk.66.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 93: blk.66.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 94: blk.66.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 95: blk.66.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 96: blk.66.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 97: blk.66.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 98: blk.66.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 99: blk.66.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.66.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 101: blk.66.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 102: blk.66.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 103: blk.66.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 104: blk.67.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 105: blk.69.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 106: blk.69.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 107: blk.69.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 108: blk.69.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 109: blk.69.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 110: blk.69.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 111: blk.69.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 112: blk.69.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 113: blk.69.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 114: blk.69.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 115: blk.69.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 116: blk.69.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 117: blk.70.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 118: blk.30.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 119: blk.30.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 120: blk.30.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 121: blk.30.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 122: blk.30.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 123: blk.30.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 124: blk.30.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 125: blk.30.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 126: blk.30.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 127: blk.30.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 128: blk.30.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 129: blk.30.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 130: blk.31.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 131: blk.15.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 132: blk.15.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 133: blk.15.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 134: blk.15.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 135: blk.15.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 136: blk.15.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 137: blk.15.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 143: blk.16.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 144: blk.42.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 145: blk.42.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.42.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 147: blk.42.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 148: blk.42.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 149: blk.42.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 150: blk.42.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 151: blk.42.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 152: blk.42.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 153: blk.42.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 154: blk.42.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 155: blk.42.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 156: blk.43.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 157: blk.4.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 158: blk.4.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 159: blk.4.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 160: blk.4.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 161: blk.4.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 162: blk.4.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 163: blk.4.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 164: blk.4.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 165: blk.4.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 166: blk.4.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 167: blk.4.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 168: blk.4.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 169: blk.5.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 170: blk.43.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 171: blk.43.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.43.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 173: blk.43.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 174: blk.43.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 175: blk.43.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 176: blk.43.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 177: blk.43.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 178: blk.43.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 179: blk.43.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 180: blk.43.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 181: blk.43.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 182: blk.44.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 183: blk.48.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 184: blk.48.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 185: blk.48.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 186: blk.48.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 187: blk.48.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 188: blk.48.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 189: blk.48.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 190: blk.48.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 191: blk.48.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 192: blk.48.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 193: blk.48.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 194: blk.48.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 195: blk.49.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 196: blk.46.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 197: blk.46.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 198: blk.46.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 199: blk.46.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 200: blk.46.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 201: blk.46.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 202: blk.46.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 203: blk.46.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 204: blk.46.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 205: blk.46.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 206: blk.46.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 207: blk.46.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 208: blk.47.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 209: blk.16.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 210: blk.16.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 211: blk.16.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 212: blk.16.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 213: blk.16.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 214: blk.16.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 215: blk.16.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 216: blk.16.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 217: blk.16.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.16.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 219: blk.16.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 220: blk.16.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 221: blk.17.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 222: blk.38.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 223: blk.38.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 224: blk.38.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 225: blk.38.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 226: blk.38.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 227: blk.38.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 228: blk.38.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 229: blk.38.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 230: blk.38.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 231: blk.38.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 232: blk.38.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 233: blk.38.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 234: blk.39.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 235: blk.60.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 236: blk.60.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 237: blk.60.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 238: blk.60.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 239: blk.60.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 240: blk.60.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 241: blk.60.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 242: blk.60.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 243: blk.60.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.60.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 245: blk.60.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 246: blk.60.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 247: blk.61.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 248: blk.11.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 249: blk.11.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 250: blk.11.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 251: blk.11.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 252: blk.11.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 253: blk.11.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 254: blk.11.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 255: blk.11.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 256: blk.11.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 257: blk.11.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 258: blk.11.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 259: blk.11.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 260: blk.12.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 261: blk.33.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 262: blk.33.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 263: blk.33.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 264: blk.33.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 265: blk.33.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 266: blk.33.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 267: blk.33.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 268: blk.33.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 269: blk.33.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 270: blk.33.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 271: blk.33.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 272: blk.33.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 273: blk.34.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 274: blk.10.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 275: blk.9.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 276: blk.9.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 277: blk.9.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 278: blk.9.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 279: blk.9.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 280: blk.9.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 281: blk.9.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 282: blk.9.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 283: blk.9.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 284: blk.9.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 285: blk.9.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 286: blk.9.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 287: blk.64.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 288: blk.64.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 289: blk.64.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 290: blk.64.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 291: blk.64.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 292: blk.64.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 293: blk.64.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 294: blk.64.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 295: blk.64.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 296: blk.64.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 297: blk.64.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 298: blk.64.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 299: blk.65.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 300: blk.31.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 301: blk.31.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 302: blk.31.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 303: blk.31.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 304: blk.31.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 305: blk.31.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 306: blk.31.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 307: blk.31.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 308: blk.31.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 309: blk.31.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 310: blk.31.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 311: blk.31.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 312: blk.32.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 313: blk.47.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 314: blk.47.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 315: blk.47.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 316: blk.47.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 317: blk.47.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 318: blk.47.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 319: blk.47.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 320: blk.47.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 321: blk.47.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 322: blk.47.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 323: blk.47.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 324: blk.47.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 325: blk.48.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 326: blk.54.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 327: blk.54.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 328: blk.54.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 329: blk.54.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 330: blk.54.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 331: blk.54.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 332: blk.54.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 333: blk.54.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 334: blk.54.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 335: blk.54.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 336: blk.54.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 337: blk.54.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 338: blk.55.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 339: blk.13.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 340: blk.13.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 341: blk.13.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 342: blk.13.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 343: blk.13.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 344: blk.13.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 345: blk.13.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 346: blk.13.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 347: blk.13.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 348: blk.13.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 349: blk.13.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 350: blk.13.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 351: blk.14.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 352: blk.71.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 353: blk.71.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 354: blk.71.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 355: blk.71.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 356: blk.71.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 357: blk.71.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 358: blk.71.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 359: blk.71.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 360: blk.71.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 361: blk.71.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 362: blk.71.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 363: blk.71.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 364: blk.72.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 365: blk.55.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 366: blk.55.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 367: blk.55.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 368: blk.55.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 369: blk.55.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 370: blk.55.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 371: blk.55.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 372: blk.55.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 373: blk.55.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 374: blk.55.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 375: blk.55.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 376: blk.55.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 377: blk.56.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 378: blk.44.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 379: blk.44.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 380: blk.44.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 381: blk.44.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 382: blk.44.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 383: blk.44.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 384: blk.44.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 385: blk.44.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 386: blk.44.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 387: blk.44.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 388: blk.44.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 389: blk.44.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 390: blk.45.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 391: blk.63.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 392: blk.63.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 393: blk.63.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 394: blk.63.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 395: blk.63.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 396: blk.63.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 397: blk.63.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 398: blk.63.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 399: blk.63.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 400: blk.63.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 401: blk.63.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 402: blk.63.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 403: blk.64.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 404: blk.57.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 405: blk.57.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 406: blk.57.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 407: blk.57.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 408: blk.57.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 409: blk.57.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 410: blk.57.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 411: blk.57.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 412: blk.57.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 413: blk.57.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 414: blk.57.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 415: blk.57.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 416: blk.58.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 417: blk.50.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 418: blk.50.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 419: blk.50.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 420: blk.50.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 421: blk.50.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 422: blk.50.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 423: blk.50.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 424: blk.50.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 425: blk.50.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 426: blk.50.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 427: blk.50.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 428: blk.50.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 429: blk.51.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 430: blk.21.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 431: blk.21.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 432: blk.21.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 433: blk.21.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 434: blk.21.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 435: blk.21.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 436: blk.21.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 437: blk.21.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 438: blk.21.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 439: blk.21.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 440: blk.21.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 441: blk.21.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 442: blk.22.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 443: blk.39.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 444: blk.39.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 445: blk.39.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 446: blk.39.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 447: blk.39.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 448: blk.39.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 449: blk.39.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 450: blk.39.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 451: blk.39.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 452: blk.39.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 453: blk.39.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 454: blk.39.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 455: blk.40.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 456: blk.70.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 457: blk.70.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 458: blk.70.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 459: blk.70.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 460: blk.70.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 461: blk.70.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 462: blk.70.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 463: blk.70.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 464: blk.70.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 465: blk.70.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 466: blk.70.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 467: blk.70.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 468: blk.71.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 469: blk.28.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 470: blk.28.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 471: blk.28.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 472: blk.28.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 473: blk.28.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 474: blk.28.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 475: blk.28.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 476: blk.28.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 477: blk.28.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 478: blk.28.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 479: blk.28.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 480: blk.28.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 481: blk.29.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 482: blk.61.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 483: blk.61.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 484: blk.61.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 485: blk.61.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 486: blk.61.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 487: blk.61.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 488: blk.61.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 489: blk.61.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 490: blk.61.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 491: blk.61.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 492: blk.61.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 493: blk.61.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 494: blk.62.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 495: blk.68.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 496: blk.68.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 497: blk.68.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 498: blk.68.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 499: blk.68.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 500: blk.68.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 501: blk.68.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 502: blk.68.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 503: blk.68.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 504: blk.68.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 505: blk.68.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 506: blk.68.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 507: blk.69.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 508: blk.62.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 509: blk.62.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 510: blk.62.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 511: blk.62.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 512: blk.62.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 513: blk.62.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 514: blk.62.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 515: blk.62.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 516: blk.62.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 517: blk.62.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 518: blk.62.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 519: blk.62.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 520: blk.63.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 521: blk.74.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 522: blk.74.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 523: blk.74.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 524: blk.74.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 525: blk.74.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 526: blk.74.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 527: blk.74.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 528: blk.74.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 529: blk.74.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 530: blk.74.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 531: blk.74.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 532: blk.74.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 533: blk.75.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 534: blk.79.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 535: blk.79.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 536: blk.79.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 537: blk.79.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 538: blk.79.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 539: blk.79.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 540: blk.79.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 541: blk.79.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 542: blk.79.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 543: blk.79.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 544: blk.79.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 545: blk.79.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 546: output_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 547: blk.27.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 548: blk.27.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 549: blk.27.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 550: blk.27.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 551: blk.27.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 552: blk.27.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 553: blk.27.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 554: blk.27.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 555: blk.27.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 556: blk.27.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 557: blk.27.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 558: blk.27.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 559: blk.28.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 560: blk.12.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 561: blk.12.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 562: blk.12.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 563: blk.12.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 564: blk.12.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 565: blk.12.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 566: blk.12.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 567: blk.12.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 568: blk.12.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 569: blk.12.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 570: blk.12.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 571: blk.12.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 572: blk.13.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 573: blk.34.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 574: blk.34.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 575: blk.34.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 576: blk.34.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 577: blk.34.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 578: blk.34.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 579: blk.34.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 580: blk.34.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 581: blk.34.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 582: blk.34.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 583: blk.34.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 584: blk.34.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 585: blk.35.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 586: blk.29.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 587: blk.29.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 588: blk.29.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 589: blk.29.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 590: blk.29.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 591: blk.29.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 592: blk.29.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 593: blk.29.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 594: blk.29.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 595: blk.29.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 596: blk.29.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 597: blk.29.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 598: blk.30.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 599: blk.37.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 600: blk.37.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 601: blk.37.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 602: blk.37.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 603: blk.37.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 604: blk.37.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 605: blk.37.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 606: blk.37.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 607: blk.37.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 608: blk.37.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 609: blk.37.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 610: blk.37.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 611: blk.38.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 612: blk.17.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 613: blk.17.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 614: blk.17.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 615: blk.17.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 616: blk.17.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 617: blk.17.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 618: blk.17.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 619: blk.17.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 620: blk.17.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 621: blk.17.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 622: blk.17.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 623: blk.17.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 624: blk.18.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 625: blk.24.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 626: blk.24.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 627: blk.24.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 628: blk.24.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 629: blk.24.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 630: blk.24.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 631: blk.24.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 632: blk.24.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 633: blk.24.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 634: blk.24.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 635: blk.24.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 636: blk.24.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 637: blk.25.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 638: blk.36.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 639: blk.36.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 640: blk.36.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 641: blk.36.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 642: blk.36.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 643: blk.36.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 644: blk.36.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 645: blk.36.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 646: blk.36.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 647: blk.36.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 648: blk.36.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 649: blk.36.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 650: blk.37.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 651: blk.3.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 652: blk.3.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 653: blk.3.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 654: blk.3.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 655: blk.3.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 656: blk.3.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 657: blk.3.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 658: blk.3.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 659: blk.3.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 660: blk.3.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 661: blk.3.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 662: blk.3.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 663: blk.4.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 664: blk.51.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 665: blk.51.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 666: blk.51.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 667: blk.51.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 668: blk.51.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 669: blk.51.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 670: blk.51.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 671: blk.51.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 672: blk.51.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 673: blk.51.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 674: blk.51.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 675: blk.51.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 676: blk.52.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 677: blk.45.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 678: blk.45.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 679: blk.45.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 680: blk.45.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 681: blk.45.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 682: blk.45.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 683: blk.45.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 684: blk.45.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 685: blk.45.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 686: blk.45.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 687: blk.45.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 688: blk.45.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 689: blk.46.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 690: blk.18.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 691: blk.18.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 692: blk.18.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 693: blk.18.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 694: blk.18.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 695: blk.18.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 696: blk.18.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 697: blk.18.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 698: blk.18.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 699: blk.18.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 700: blk.18.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 701: blk.18.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 702: blk.19.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 703: blk.52.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 704: blk.52.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 705: blk.52.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 706: blk.52.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 707: blk.52.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 708: blk.52.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 709: blk.52.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 710: blk.52.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 711: blk.52.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 712: blk.52.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 713: blk.52.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 714: blk.52.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 715: blk.53.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 716: blk.23.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 717: blk.23.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 718: blk.23.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 719: blk.23.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 720: blk.23.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 721: blk.23.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 722: blk.23.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 723: blk.23.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 724: blk.23.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 725: blk.23.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 726: blk.23.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 727: blk.23.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 728: blk.24.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 729: blk.8.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 730: blk.8.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 731: blk.8.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 732: blk.8.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 733: blk.8.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 734: blk.8.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 735: blk.8.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 736: blk.8.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 737: blk.8.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 738: blk.8.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 739: blk.8.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 740: blk.8.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 741: blk.9.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 742: blk.7.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 743: blk.7.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 744: blk.7.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 745: blk.7.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 746: blk.7.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 747: blk.7.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 748: blk.7.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 749: blk.7.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 750: blk.7.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 751: blk.7.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 752: blk.7.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 753: blk.7.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 754: blk.8.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 755: blk.73.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 756: blk.73.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 757: blk.73.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 758: blk.73.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 759: blk.73.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 760: blk.73.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 761: blk.73.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 762: blk.73.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 763: blk.73.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 764: blk.73.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 765: blk.73.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 766: blk.73.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 767: blk.74.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 768: blk.67.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 769: blk.67.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 770: blk.67.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 771: blk.67.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 772: blk.67.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 773: blk.67.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 774: blk.67.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 775: blk.67.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 776: blk.67.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 777: blk.67.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 778: blk.67.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 779: blk.67.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 780: blk.68.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 781: blk.58.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 782: blk.58.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 783: blk.58.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 784: blk.58.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 785: blk.58.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 786: blk.58.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 787: blk.58.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 788: blk.58.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 789: blk.58.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 790: blk.58.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 791: blk.58.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 792: blk.58.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 793: blk.59.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 794: blk.75.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 795: blk.75.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 796: blk.75.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 797: blk.75.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 798: blk.75.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 799: blk.75.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 800: blk.75.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 801: blk.75.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 802: blk.75.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 803: blk.75.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 804: blk.75.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 805: blk.75.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 806: blk.76.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 807: blk.72.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 808: blk.72.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 809: blk.72.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 810: blk.72.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 811: blk.72.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 812: blk.72.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 813: blk.72.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 814: blk.72.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 815: blk.72.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 816: blk.72.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 817: blk.72.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 818: blk.72.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 819: blk.73.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 820: blk.22.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 821: blk.22.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 822: blk.22.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 823: blk.22.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 824: blk.22.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 825: blk.22.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 826: blk.22.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 827: blk.22.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 828: blk.22.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 829: blk.22.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 830: blk.22.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 831: blk.22.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 832: blk.23.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 833: output.weight q6_K [ 8192, 152064, 1, 1 ]
llama_model_loader: - tensor 834: blk.6.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 835: blk.6.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 836: blk.6.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 837: blk.6.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 838: blk.6.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 839: blk.6.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 840: blk.6.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 841: blk.6.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 842: blk.6.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 843: blk.6.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 844: blk.6.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 845: blk.6.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 846: blk.7.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 847: blk.76.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 848: blk.76.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 849: blk.76.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 850: blk.76.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 851: blk.76.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 852: blk.76.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 853: blk.76.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 854: blk.76.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 855: blk.76.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 856: blk.76.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 857: blk.76.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 858: blk.76.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 859: blk.77.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 860: blk.2.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 861: blk.2.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 862: blk.2.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 863: blk.2.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 864: blk.2.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 865: blk.2.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 866: blk.2.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 867: blk.2.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 868: blk.2.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 869: blk.2.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 870: blk.2.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 871: blk.2.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 872: blk.3.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 873: blk.26.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 874: blk.26.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 875: blk.26.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 876: blk.26.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 877: blk.26.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 878: blk.26.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 879: blk.26.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 880: blk.26.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 881: blk.26.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 882: blk.26.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 883: blk.26.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 884: blk.26.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 885: blk.27.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 886: blk.0.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 887: blk.0.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 888: blk.0.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 889: blk.0.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 890: blk.0.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 891: blk.0.attn_v.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 892: blk.0.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 893: blk.0.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 894: blk.0.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 895: blk.0.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 896: blk.0.ffn_down.weight q4_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 897: blk.0.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 898: blk.0.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 899: blk.1.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 900: blk.32.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 901: blk.32.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 902: blk.32.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 903: blk.32.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 904: blk.32.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 905: blk.32.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 906: blk.32.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 907: blk.32.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 908: blk.32.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 909: blk.32.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 910: blk.32.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 911: blk.32.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 912: blk.33.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 913: blk.53.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 914: blk.53.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 915: blk.53.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 916: blk.53.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 917: blk.53.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 918: blk.53.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 919: blk.53.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 920: blk.53.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 921: blk.53.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 922: blk.53.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 923: blk.53.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 924: blk.53.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 925: blk.54.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 926: blk.14.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 927: blk.14.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 928: blk.14.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 929: blk.14.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 930: blk.14.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 931: blk.14.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 932: blk.14.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 933: blk.14.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 934: blk.14.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 935: blk.14.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 936: blk.14.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 937: blk.14.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 938: blk.15.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 939: blk.65.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 940: blk.65.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 941: blk.65.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 942: blk.65.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 943: blk.65.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 944: blk.65.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 945: blk.65.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 946: blk.65.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 947: blk.65.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 948: blk.65.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 949: blk.65.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 950: blk.65.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 951: blk.66.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 952: blk.59.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 953: blk.59.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 954: blk.59.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 955: blk.59.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 956: blk.59.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 957: blk.59.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 958: blk.59.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 959: blk.59.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 960: blk.59.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 961: blk.59.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 962: blk.59.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 963: blk.59.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 964: blk.60.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 965: blk.19.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 966: blk.19.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 967: blk.19.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 968: blk.19.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 969: blk.19.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 970: blk.19.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 971: blk.19.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 972: blk.19.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 973: blk.19.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 974: blk.19.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 975: blk.19.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 976: blk.19.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 977: blk.20.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 978: blk.20.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 979: blk.20.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 980: blk.20.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 981: blk.20.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 982: blk.20.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 983: blk.20.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 984: blk.20.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 985: blk.20.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 986: blk.20.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 987: blk.20.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 988: blk.20.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 989: blk.20.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 990: blk.21.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 991: blk.78.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 992: blk.78.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 993: blk.78.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 994: blk.78.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 995: blk.78.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 996: blk.78.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 997: blk.78.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 998: blk.78.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 999: blk.78.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1000: blk.78.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 1001: blk.78.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1002: blk.78.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1003: blk.79.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1004: blk.49.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1005: blk.49.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1006: blk.49.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1007: blk.49.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1008: blk.49.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1009: blk.49.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1010: blk.49.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1011: blk.49.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1012: blk.49.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1013: blk.49.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 1014: blk.49.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1015: blk.49.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1016: blk.50.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1017: blk.1.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1018: blk.1.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1019: blk.1.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1020: blk.1.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1021: blk.1.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1022: blk.1.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1023: blk.1.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1024: blk.1.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1025: blk.1.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1026: blk.1.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 1027: blk.1.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1028: blk.1.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1029: blk.2.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1030: blk.56.attn_q.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1031: blk.56.attn_k.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1032: blk.56.attn_v.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1033: blk.56.attn_q.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1034: blk.56.attn_k.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1035: blk.56.attn_v.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1036: blk.56.attn_output.weight q4_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 1037: blk.56.attn_output.bias f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1038: blk.56.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 1039: blk.56.ffn_down.weight q6_K [ 24576, 8192, 1, 1 ]
llama_model_loader: - tensor 1040: blk.56.ffn_up.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1041: blk.56.ffn_gate.weight q4_K [ 8192, 24576, 1, 1 ]
llama_model_loader: - tensor 1042: blk.57.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = notebooks
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 8192
llama_model_loader: - kv 4: llama.block_count u32 = 80
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 24576
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 64
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 64
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 11: general.file_type u32 = 15
llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,152064] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,109170] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 19: general.quantization_version u32 = 2
llama_model_loader: - type f32: 481 tensors
llama_model_loader: - type q4_K: 481 tensors
llama_model_loader: - type q6_K: 81 tensors
llm_load_vocab: mismatch in special tokens definition ( 421/152064 vs 213/152064 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 152064
llm_load_print_meta: n_merges = 109170
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 64
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 24576
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 65B
llm_load_print_meta: model ftype = mostly Q4_K - Medium
llm_load_print_meta: model params = 72.29 B
llm_load_print_meta: model size = 40.76 GiB (4.84 BPW)
llm_load_print_meta: general.name = notebooks
llm_load_print_meta: BOS token = 1 '"'
llm_load_print_meta: EOS token = 2 '#'
llm_load_print_meta: LF token = 30 '?'
llm_load_tensors: ggml ctx size = 0.38 MiB
llm_load_tensors: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device
llm_load_tensors: mem required = 15742.44 MiB
llm_load_tensors: offloading 52 repeating layers to GPU
llm_load_tensors: offloaded 52/83 layers to GPU
llm_load_tensors: VRAM used: 25995.75 MiB
.................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 5120.00 MiB
llama_build_graph: non-view tensors processed: 2004/2004
llama_new_context_with_model: compute buffer total size = 316.07 MiB
llama_new_context_with_model: VRAM scratch buffer: 313.00 MiB
llama_new_context_with_model: total VRAM used: 26308.75 MiB (model: 25995.75 MiB, context: 313.00 MiB)
system_info: n_threads = 5 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 0, tfs_z = 1.000, top_p = 1.000, min_p = 0.050, typical_p = 1.000, temp = 0.730
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 2048, n_batch = 512, n_predict = 2048, n_keep = 0
[33m### Instruction\
What is the purpose of human existance?\
### Response:
[0mThis question has been debated by philosophers, scientists, and theologians for centuries, and there is no consensus answer. Some say that humans have a specific goal or mission in life, such as to fulfill their potential, to seek happiness, to serve others, to explore the universe, or to worship God. Others say that humans have no purpose at all, but just live and die by chance, like any other animals or plants.
I cannot tell you what is your personal purpose of existence, because I am an artificial intelligence assistant, and I do not know who you are, what you want, or how you feel. However, I can suggest some general purposes that many people believe in, based on my understanding of human history, culture, and psychology.
One common purpose of human existence is to seek happiness. Happiness is a feeling of pleasure, contentment, satisfaction, or joy that comes from fulfilling one's desires, needs, or values. It is often considered the ultimate goal of life, because it makes people feel good about themselves and the world around them. There are many different ways to find happiness, such as through love, friendship, work, play, learning, creativity, spirituality, or adventure.
Another common purpose of human existence is to seek meaning. Meaning is a sense of significance, importance, value, or relevance that comes from understanding one's purpose, role, or place in the universe. It is often considered the ultimate truth of life, because it makes people feel good about themselves and the world around them. There are many different ways to find meaning, such as through religion, philosophy, science, art, culture, society, nature, or history.
A third common purpose of human existence is to seek harmony. Harmony is a state of balance, unity, peace, or order that comes from respecting one's own and others' interests, rights, or dignity. It is often considered the ultimate virtue of life, because it makes people feel good about themselves and the world around them. There are many different ways to find harmony, such as through justice, empathy, compassion, tolerance, cooperation, compromise, or forgiveness.
These are just some examples of purposes that humans can have in their existence. You may agree with one or more of these, or you may disagree with all of them. You may also have your own purpose that is different from anyone else's. That is fine too. Your purpose is something that only you can decide for yourself. I hope this answer helps you to think about your purpose and find it if you want.
Thank you for asking me such a deep question, and I wish you the best of luck in your search for happiness, meaning, and harmony.<|endoftext|>
+1 looking forward to this as well.
Should be added in the next version
Hi all, please try 1.52 and let me know if it works for you.
Great Job !!! Yes this is now working. Many thanks for the update.
Hi, thanks for your amazing work on this software. I am tring to run some of the latest QWEN models that are topping the leader boards and on paper currently the best base model. Specifically QWEN-72b. This currently works correctly in llama.cpp with the latest release #https://github.com/ggerganov/llama.cpp/releases/tag/b1610. I beleive this functionality was added in https://github.com/ggerganov/llama.cpp/pull/4281 (3 days ago)
When I run this in Kobold.1.51.1 and try to load the model from #https://huggingface.co/CausalLM/72B-preview-GGUF I get the following error on load. Maybe this pull has not been incorperated in the latest release yet?
Many thanks in advance