ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.51k stars 9.69k forks source link

When I try to do finetuning I get a GGML_ASSERT: ggml.c:16911: np < GGML_MAX_PARAMS error. #4342

Closed Taikono-Himazin closed 11 months ago

Taikono-Himazin commented 11 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

finetuning Llama 2 70B should succeed.

Current Behavior

finetuning Llama 2 70B fails with

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

$ system_profiler SPHardwareDataType
Hardware:

    Hardware Overview:

      Model Name: Mac Studio
      Model Identifier: Mac14,14
      Model Number: G180LJ/A
      Chip: Apple M2 Ultra
      Total Number of Cores: 24 (16 performance and 8 efficiency)
      Memory: 192 GB
      System Firmware Version: 10151.41.12
      OS Loader Version: 10151.41.12
      Serial Number (system): xxxxxxxxxxxxxxxx
      Hardware UUID: xxxxxxxxxxxxxxxxx
      Provisioning UDID: xxxxxxxxxxxxxxxxxx
      Activation Lock Status: Disabled
$ uname -a
Darwin xxxxxxxxx-MacStudio 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct  9 21:28:45 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6020 arm64
$ python3 --version
Python 3.11.5

$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Failure Information (for bugs)

Please help provide information about the failure / bug.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. I tried to finetuning this model and ran the following command.
    ./finetune \
    --model-base hf_downloads/japanese-stablelm-instruct-beta-70b.Q8_0.gguf \
    --checkpoint-out finetuning-ITERATION.gguf \
    --lora-out finetuning-LoRA-ITERATION.bin \
    --train-data ./training/datasets/data.txt \
    --save-every 10 --threads 32 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing

Doing so will result in an error.

GGML_ASSERT: ggml.c:16911: np < GGML_MAX_PARAMS
zsh: abort      ./finetune --model-base  --checkpoint-out emploee_list-ITERATION.gguf      10

It worked fine in 7b model, so it seems to be an error that occurs as the scale increases. What does this limit mean? What happens if you increase the number?

I thought this problem was related, but it seems like it's a different error.

Thank you!

Taikono-Himazin commented 11 months ago

I will post the log in parts due to character limit.

Failure Logs

llama_model_loader: loaded meta data with 19 key-value pairs and 723 tensors from hf_downloads/japanese-stablelm-instruct-beta-70b.Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q8_0     [  8192, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor    8:              blk.0.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor    9:              blk.0.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   10:           blk.1.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   11:            blk.1.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   12:            blk.1.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   13:              blk.1.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   14:            blk.1.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   15:              blk.1.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   16:         blk.1.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   17:              blk.1.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   18:              blk.1.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   19:           blk.2.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   20:            blk.2.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   21:            blk.2.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   22:              blk.2.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   23:            blk.2.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   24:              blk.2.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   25:         blk.2.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   26:              blk.2.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   27:              blk.2.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   28:           blk.3.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   29:            blk.3.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   30:            blk.3.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   31:              blk.3.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   32:            blk.3.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   33:              blk.3.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   34:         blk.3.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   35:              blk.3.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   36:              blk.3.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   37:           blk.4.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   38:            blk.4.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   39:            blk.4.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   40:              blk.4.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   41:            blk.4.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   42:              blk.4.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   43:         blk.4.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   44:              blk.4.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   45:              blk.4.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   46:            blk.5.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   47:              blk.5.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   48:         blk.5.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   49:              blk.5.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   50:              blk.5.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   51:          blk.10.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   52:           blk.10.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   53:           blk.10.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   54:             blk.10.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   55:           blk.10.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   56:             blk.10.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   57:        blk.10.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   58:             blk.10.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   59:             blk.10.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   60:             blk.11.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   61:        blk.11.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   62:             blk.11.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   63:             blk.11.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   64:           blk.5.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   65:            blk.5.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   66:              blk.5.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   67:            blk.5.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   68:           blk.6.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   69:            blk.6.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   70:            blk.6.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   71:              blk.6.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   72:            blk.6.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   73:              blk.6.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   74:         blk.6.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   75:              blk.6.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   76:              blk.6.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   77:           blk.7.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   78:            blk.7.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   79:            blk.7.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   80:              blk.7.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   81:            blk.7.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   82:              blk.7.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   83:         blk.7.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   84:              blk.7.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   85:              blk.7.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   86:           blk.8.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   87:            blk.8.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   88:            blk.8.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   89:              blk.8.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   90:            blk.8.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   91:              blk.8.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   92:         blk.8.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   93:              blk.8.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor   94:              blk.8.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor   95:           blk.9.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   96:            blk.9.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor   97:            blk.9.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   98:              blk.9.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor   99:            blk.9.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  100:              blk.9.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  101:         blk.9.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  102:              blk.9.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  103:              blk.9.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  104:          blk.11.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  105:           blk.11.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  106:           blk.11.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  107:             blk.11.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  108:           blk.11.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  109:          blk.12.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  110:           blk.12.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  111:           blk.12.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  112:             blk.12.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  113:           blk.12.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  114:             blk.12.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  115:        blk.12.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  116:             blk.12.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  117:             blk.12.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  118:          blk.13.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  119:           blk.13.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  120:           blk.13.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  121:             blk.13.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  122:           blk.13.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  123:             blk.13.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  124:        blk.13.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  125:             blk.13.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  126:             blk.13.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  127:          blk.14.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  128:           blk.14.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  129:           blk.14.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  130:             blk.14.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  131:           blk.14.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  132:             blk.14.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  133:        blk.14.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  134:             blk.14.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  135:             blk.14.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  136:          blk.15.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  137:           blk.15.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  138:           blk.15.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  139:             blk.15.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  140:           blk.15.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  141:             blk.15.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  142:        blk.15.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  143:             blk.15.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  144:             blk.15.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  145:          blk.16.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  146:           blk.16.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  147:           blk.16.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  148:             blk.16.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  149:           blk.16.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  150:             blk.16.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  151:        blk.16.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  152:             blk.16.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  153:             blk.16.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  154:          blk.17.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  155:           blk.17.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  156:           blk.17.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  157:             blk.17.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  158:           blk.17.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  159:             blk.17.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  160:        blk.17.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  161:             blk.17.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  162:             blk.17.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  163:          blk.18.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  164:           blk.18.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  165:           blk.18.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  166:             blk.18.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  167:           blk.18.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  168:             blk.18.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  169:        blk.18.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  170:             blk.18.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  171:             blk.18.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  172:          blk.19.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  173:           blk.19.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  174:           blk.19.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  175:             blk.19.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  176:           blk.19.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  177:             blk.19.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  178:        blk.19.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  179:             blk.19.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  180:             blk.19.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  181:          blk.20.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  182:           blk.20.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  183:           blk.20.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  184:             blk.20.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  185:           blk.20.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  186:             blk.20.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  187:        blk.20.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  188:             blk.20.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  189:             blk.20.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  190:          blk.21.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  191:           blk.21.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  192:           blk.21.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  193:             blk.21.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  194:           blk.21.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  195:             blk.21.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  196:        blk.21.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  197:             blk.21.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  198:             blk.21.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  199:           blk.22.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  200:             blk.22.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  201:             blk.22.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  202:        blk.22.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  203:             blk.22.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  204:             blk.22.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  205:          blk.22.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  206:           blk.22.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  207:           blk.22.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  208:          blk.23.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  209:           blk.23.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  210:           blk.23.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  211:             blk.23.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  212:           blk.23.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  213:             blk.23.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  214:        blk.23.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  215:             blk.23.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  216:             blk.23.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  217:          blk.24.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  218:           blk.24.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  219:           blk.24.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  220:             blk.24.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  221:           blk.24.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  222:             blk.24.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  223:        blk.24.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  224:             blk.24.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  225:             blk.24.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  226:          blk.25.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  227:           blk.25.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  228:           blk.25.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  229:             blk.25.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  230:           blk.25.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  231:             blk.25.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  232:        blk.25.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  233:             blk.25.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  234:             blk.25.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  235:          blk.26.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  236:           blk.26.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  237:           blk.26.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  238:             blk.26.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  239:           blk.26.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  240:             blk.26.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  241:        blk.26.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  242:             blk.26.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  243:             blk.26.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  244:          blk.27.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  245:           blk.27.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  246:           blk.27.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  247:             blk.27.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  248:           blk.27.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  249:             blk.27.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  250:        blk.27.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  251:             blk.27.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  252:             blk.27.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  253:           blk.28.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  254:             blk.28.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  255:        blk.28.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  256:             blk.28.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  257:             blk.28.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  258:          blk.28.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  259:           blk.28.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  260:             blk.28.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  261:           blk.28.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  262:          blk.29.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  263:           blk.29.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  264:           blk.29.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  265:             blk.29.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  266:           blk.29.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  267:             blk.29.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  268:        blk.29.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  269:             blk.29.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  270:             blk.29.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  271:          blk.30.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  272:           blk.30.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  273:           blk.30.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  274:             blk.30.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  275:           blk.30.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  276:             blk.30.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  277:        blk.30.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  278:             blk.30.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  279:             blk.30.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  280:          blk.31.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  281:           blk.31.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  282:           blk.31.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  283:             blk.31.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  284:           blk.31.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  285:             blk.31.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  286:        blk.31.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  287:             blk.31.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  288:             blk.31.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  289:          blk.32.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  290:           blk.32.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  291:           blk.32.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  292:             blk.32.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  293:           blk.32.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  294:             blk.32.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  295:        blk.32.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  296:             blk.32.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  297:             blk.32.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  298:          blk.33.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  299:           blk.33.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  300:           blk.33.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  301:             blk.33.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  302:           blk.33.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  303:             blk.33.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  304:        blk.33.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  305:             blk.33.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  306:             blk.33.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  307:             blk.34.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  308:        blk.34.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  309:             blk.34.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  310:             blk.34.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  311:          blk.34.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  312:           blk.34.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  313:           blk.34.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  314:             blk.34.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  315:           blk.34.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  316:          blk.35.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  317:           blk.35.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  318:           blk.35.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  319:             blk.35.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  320:           blk.35.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  321:             blk.35.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  322:        blk.35.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  323:             blk.35.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  324:             blk.35.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  325:          blk.36.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  326:           blk.36.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  327:           blk.36.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  328:             blk.36.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  329:           blk.36.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  330:             blk.36.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  331:        blk.36.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  332:             blk.36.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  333:             blk.36.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  334:          blk.37.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  335:           blk.37.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  336:           blk.37.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  337:             blk.37.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  338:           blk.37.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  339:             blk.37.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  340:        blk.37.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  341:             blk.37.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  342:             blk.37.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  343:          blk.38.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  344:           blk.38.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  345:           blk.38.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  346:             blk.38.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  347:           blk.38.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  348:             blk.38.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  349:        blk.38.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  350:             blk.38.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  351:             blk.38.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  352:          blk.39.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  353:           blk.39.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  354:           blk.39.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  355:             blk.39.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  356:           blk.39.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  357:             blk.39.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  358:        blk.39.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  359:             blk.39.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  360:             blk.39.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  361:          blk.40.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  362:           blk.40.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  363:           blk.40.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  364:             blk.40.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  365:           blk.40.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  366:             blk.40.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  367:        blk.40.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  368:             blk.40.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  369:             blk.40.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  370:          blk.41.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  371:           blk.41.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  372:           blk.41.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  373:             blk.41.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  374:           blk.41.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  375:             blk.41.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  376:        blk.41.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  377:             blk.41.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  378:             blk.41.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  379:          blk.42.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  380:           blk.42.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  381:           blk.42.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  382:             blk.42.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  383:           blk.42.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  384:             blk.42.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  385:        blk.42.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  386:             blk.42.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  387:             blk.42.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  388:          blk.43.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  389:           blk.43.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  390:           blk.43.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  391:             blk.43.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  392:           blk.43.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  393:             blk.43.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  394:        blk.43.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  395:             blk.43.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  396:             blk.43.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  397:          blk.44.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  398:           blk.44.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  399:           blk.44.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  400:             blk.44.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  401:           blk.44.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  402:             blk.44.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  403:        blk.44.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  404:             blk.44.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  405:             blk.44.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  406:           blk.45.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  407:             blk.45.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  408:             blk.45.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  409:        blk.45.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  410:             blk.45.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  411:             blk.45.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  412:          blk.45.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  413:           blk.45.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  414:           blk.45.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  415:          blk.46.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  416:           blk.46.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  417:           blk.46.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  418:             blk.46.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  419:           blk.46.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  420:             blk.46.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  421:        blk.46.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  422:             blk.46.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  423:             blk.46.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  424:          blk.47.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  425:           blk.47.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  426:           blk.47.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  427:             blk.47.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  428:           blk.47.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  429:             blk.47.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  430:        blk.47.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  431:             blk.47.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  432:             blk.47.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  433:          blk.48.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  434:           blk.48.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  435:           blk.48.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  436:             blk.48.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  437:           blk.48.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  438:             blk.48.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  439:        blk.48.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  440:             blk.48.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  441:             blk.48.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  442:          blk.49.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  443:           blk.49.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  444:           blk.49.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  445:             blk.49.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  446:           blk.49.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  447:             blk.49.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  448:        blk.49.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  449:             blk.49.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  450:             blk.49.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  451:          blk.50.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  452:           blk.50.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  453:           blk.50.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  454:             blk.50.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  455:           blk.50.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  456:             blk.50.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  457:        blk.50.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  458:             blk.50.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  459:             blk.50.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  460:           blk.51.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  461:             blk.51.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  462:        blk.51.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  463:             blk.51.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  464:             blk.51.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  465:          blk.51.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  466:           blk.51.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  467:             blk.51.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  468:           blk.51.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  469:          blk.52.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  470:           blk.52.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  471:           blk.52.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  472:             blk.52.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  473:           blk.52.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  474:             blk.52.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  475:        blk.52.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  476:             blk.52.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  477:             blk.52.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  478:          blk.53.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  479:           blk.53.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  480:           blk.53.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  481:             blk.53.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  482:           blk.53.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  483:             blk.53.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  484:        blk.53.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  485:             blk.53.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  486:             blk.53.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  487:          blk.54.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  488:           blk.54.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  489:           blk.54.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  490:             blk.54.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  491:           blk.54.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  492:             blk.54.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  493:        blk.54.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  494:             blk.54.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  495:             blk.54.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  496:          blk.55.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  497:           blk.55.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  498:           blk.55.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  499:             blk.55.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  500:           blk.55.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  501:             blk.55.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  502:        blk.55.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  503:             blk.55.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  504:             blk.55.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  505:          blk.56.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  506:           blk.56.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  507:           blk.56.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  508:             blk.56.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  509:           blk.56.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  510:             blk.56.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  511:        blk.56.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  512:             blk.56.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  513:             blk.56.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  514:             blk.57.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  515:        blk.57.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  516:             blk.57.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  517:             blk.57.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  518:          blk.57.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  519:           blk.57.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  520:           blk.57.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  521:             blk.57.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  522:           blk.57.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  523:          blk.58.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  524:           blk.58.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  525:           blk.58.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  526:             blk.58.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  527:           blk.58.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  528:             blk.58.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  529:        blk.58.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  530:             blk.58.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  531:             blk.58.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  532:          blk.59.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  533:           blk.59.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  534:           blk.59.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  535:             blk.59.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  536:           blk.59.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  537:             blk.59.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  538:        blk.59.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  539:             blk.59.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  540:             blk.59.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  541:          blk.60.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  542:           blk.60.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  543:           blk.60.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  544:             blk.60.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  545:           blk.60.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  546:             blk.60.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  547:        blk.60.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  548:             blk.60.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  549:             blk.60.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  550:          blk.61.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  551:           blk.61.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  552:           blk.61.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  553:             blk.61.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  554:           blk.61.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  555:             blk.61.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  556:        blk.61.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  557:             blk.61.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  558:             blk.61.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  559:          blk.62.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  560:           blk.62.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  561:           blk.62.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  562:             blk.62.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  563:           blk.62.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  564:             blk.62.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  565:        blk.62.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  566:             blk.62.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  567:             blk.62.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  568:          blk.63.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  569:           blk.63.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  570:           blk.63.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  571:             blk.63.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  572:           blk.63.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  573:             blk.63.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  574:        blk.63.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  575:             blk.63.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  576:             blk.63.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  577:          blk.64.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  578:           blk.64.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  579:           blk.64.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  580:             blk.64.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  581:           blk.64.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  582:             blk.64.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  583:        blk.64.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  584:             blk.64.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  585:             blk.64.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  586:          blk.65.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  587:           blk.65.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  588:           blk.65.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  589:             blk.65.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  590:           blk.65.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  591:             blk.65.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  592:        blk.65.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  593:             blk.65.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  594:             blk.65.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  595:          blk.66.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  596:           blk.66.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  597:           blk.66.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  598:             blk.66.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  599:           blk.66.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  600:             blk.66.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  601:        blk.66.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  602:             blk.66.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  603:             blk.66.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  604:          blk.67.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  605:           blk.67.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  606:           blk.67.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  607:             blk.67.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  608:           blk.67.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  609:             blk.67.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  610:        blk.67.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  611:             blk.67.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  612:             blk.67.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  613:           blk.68.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  614:             blk.68.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  615:             blk.68.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  616:        blk.68.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  617:             blk.68.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  618:             blk.68.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  619:          blk.68.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  620:           blk.68.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  621:           blk.68.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  622:          blk.69.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  623:           blk.69.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  624:           blk.69.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  625:             blk.69.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  626:           blk.69.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  627:             blk.69.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  628:        blk.69.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  629:             blk.69.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  630:             blk.69.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  631:          blk.70.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  632:           blk.70.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  633:           blk.70.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  634:             blk.70.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  635:           blk.70.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  636:             blk.70.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  637:        blk.70.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  638:             blk.70.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  639:             blk.70.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  640:          blk.71.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  641:           blk.71.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  642:           blk.71.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  643:             blk.71.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  644:           blk.71.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  645:             blk.71.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  646:        blk.71.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  647:             blk.71.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  648:             blk.71.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  649:          blk.72.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  650:           blk.72.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  651:           blk.72.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  652:             blk.72.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  653:           blk.72.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  654:             blk.72.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  655:        blk.72.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  656:             blk.72.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  657:             blk.72.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  658:          blk.73.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  659:           blk.73.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  660:           blk.73.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  661:             blk.73.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  662:           blk.73.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  663:             blk.73.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  664:        blk.73.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  665:             blk.73.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  666:             blk.73.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  667:           blk.74.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  668:             blk.74.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  669:        blk.74.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  670:             blk.74.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  671:             blk.74.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  672:          blk.74.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  673:           blk.74.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  674:             blk.74.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  675:           blk.74.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  676:          blk.75.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  677:           blk.75.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  678:           blk.75.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  679:             blk.75.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  680:           blk.75.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  681:             blk.75.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  682:        blk.75.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  683:             blk.75.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  684:             blk.75.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  685:          blk.76.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  686:           blk.76.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  687:           blk.76.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  688:             blk.76.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  689:           blk.76.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  690:             blk.76.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  691:        blk.76.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  692:             blk.76.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  693:             blk.76.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  694:          blk.77.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  695:           blk.77.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  696:           blk.77.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  697:             blk.77.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  698:           blk.77.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  699:             blk.77.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  700:        blk.77.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  701:             blk.77.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  702:             blk.77.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  703:          blk.78.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  704:           blk.78.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  705:           blk.78.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  706:             blk.78.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  707:           blk.78.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  708:             blk.78.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  709:        blk.78.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  710:             blk.78.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  711:             blk.78.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  712:          blk.79.attn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  713:           blk.79.ffn_down.weight q8_0     [ 28672,  8192,     1,     1 ]
llama_model_loader: - tensor  714:           blk.79.ffn_gate.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  715:             blk.79.ffn_up.weight q8_0     [  8192, 28672,     1,     1 ]
llama_model_loader: - tensor  716:           blk.79.ffn_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  717:             blk.79.attn_k.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  718:        blk.79.attn_output.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  719:             blk.79.attn_q.weight q8_0     [  8192,  8192,     1,     1 ]
llama_model_loader: - tensor  720:             blk.79.attn_v.weight q8_0     [  8192,  1024,     1,     1 ]
llama_model_loader: - tensor  721:               output_norm.weight f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  722:                    output.weight q8_0     [  8192, 32000,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv   4:                          llama.block_count u32              = 80
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 7
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  161 tensors
llama_model_loader: - type q8_0:  562 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 68.98 B
llm_load_print_meta: model size       = 68.26 GiB (8.50 BPW)
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.26 MiB
llm_load_tensors: mem required  = 69896.55 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  160.00 MiB
llama_build_graph: non-view tensors processed: 1684/1684
llama_new_context_with_model: compute buffer total size = 148.07 MiB
GGML_ASSERT: ggml.c:16911: np < GGML_MAX_PARAMS
zsh: abort      ./finetune --model-base  --checkpoint-out emploee_list-ITERATION.gguf      10
yanghoonkim commented 11 months ago

I am suffering the same situation with the same device + 70B quantized model.

slaren commented 11 months ago

Increasing GGML_MAX_PARAMS in ggml.h may solve the isssue. I guess nobody tried to finetune a 70B model until now.

yanghoonkim commented 11 months ago

@slaren what does GGML_MAX_PARAMS mean?

slaren commented 11 months ago

From what I can tell, it is the maximum number of trainable tensors in a graph, but I don't know a lot about the training code. @ggerganov and @xaedes would know more about this.

xaedes commented 11 months ago

Yep, it is the maximum number of trainable tensors. Since there are two trainable lora tensors for each matrix required is roughly twice the number of base model tensors. According to the log you posted there are 723 tensors, so you probably need at least 1446 for GGML_MAX_PARAMS.

Taikono-Himazin commented 11 months ago

For now, I set it to 10240, which is 10 times higher, and compiled it. Finetuning was completed without any problems. I feel like there is no problem even if I increase the limit or delete this limit, but what do you think? Will increasing it cause memory overflow?

slaren commented 11 months ago

It's used in a local, so it may cause a stack overflow if it is too big.

Taikono-Himazin commented 11 months ago

So, why not try setting it to 2048 (2^11)? I tried it locally and was able to perform finetuning with 70b without any problems.

Taikono-Himazin commented 11 months ago

I've submitted a pull request. Please check.