Prerequisites

Please answer the following questions for yourself before submitting an issue.
[x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[x] I carefully followed the README.md.
[x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[x] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior

All words in the prompt and completions should be used.
Current Behavior

I ported the example code from batched.swift in the repo. I noticed that certain words like "interested", "Francisco", and other random words are getting skipped in the tokenizer both in the prompt and in the response. This makes the completions seem nonsensical. The original example project in the repo has the same issue.
Environment and Context

M2 Max MacBook Pro (32GB), M2 Mac Mini (24GB) macOS 14.0 Swift (macOS app) via Swift Package
Failure Information (for bugs)

Please help provide information about the failure / bug.
I used the same token code from the example project.
Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
Run the example batched.swift project.
I used LLaMA-2-13B-Chat
Use the prompt: "Transcript of a text message, where Ethan interacts with his virtual assistant, Nova. Nova is helpful, kind, honest, good at writing, and never fails to answer Ethan's requests immediately and with precision.\nETHAN: Hello, what can you tell me about Cupertino and it's surounding areas? I'm interested in museums and zoos.\n"
Logs

llama_model_loader: loaded meta data with 16 key-value pairs and 363 tensors from /Users/ethan/Downloads/llama-2-13b-chat/ggml-model-q4_0.gguf (version unknown)
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    8:            blk.0.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor    9:              blk.0.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   10:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   11:            blk.0.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   12:              blk.1.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   13:              blk.1.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   14:              blk.1.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   15:         blk.1.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   16:            blk.1.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   17:            blk.1.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   18:              blk.1.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   19:           blk.1.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   20:            blk.1.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   21:              blk.2.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   22:              blk.2.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   23:              blk.2.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   24:         blk.2.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   25:            blk.2.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   26:            blk.2.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   27:              blk.2.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   28:           blk.2.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   29:            blk.2.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   30:              blk.3.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   31:              blk.3.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   32:              blk.3.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   33:         blk.3.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   34:            blk.3.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   35:            blk.3.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   36:              blk.3.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   37:           blk.3.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   38:            blk.3.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   39:              blk.4.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   40:              blk.4.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   41:              blk.4.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   42:         blk.4.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   43:            blk.4.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   44:            blk.4.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   45:              blk.4.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   46:           blk.4.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   47:            blk.4.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   48:              blk.5.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   49:              blk.5.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   50:              blk.5.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   51:         blk.5.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   52:            blk.5.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   53:            blk.5.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   54:              blk.5.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   55:           blk.5.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   56:            blk.5.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   57:              blk.6.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   58:              blk.6.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   59:              blk.6.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   60:         blk.6.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   61:            blk.6.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   62:            blk.6.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   63:              blk.6.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   64:           blk.6.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   65:            blk.6.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   66:              blk.7.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   67:              blk.7.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   68:              blk.7.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   69:         blk.7.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   70:            blk.7.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   71:            blk.7.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   72:              blk.7.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   73:           blk.7.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   74:            blk.7.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   75:              blk.8.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   76:              blk.8.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   77:              blk.8.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   78:         blk.8.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   79:            blk.8.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   80:            blk.8.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   81:              blk.8.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   82:           blk.8.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   83:            blk.8.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   84:              blk.9.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   85:              blk.9.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   86:              blk.9.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   87:         blk.9.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   88:            blk.9.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   89:            blk.9.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   90:              blk.9.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   91:           blk.9.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   92:            blk.9.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor   93:             blk.10.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   94:             blk.10.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   95:             blk.10.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   96:        blk.10.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor   97:           blk.10.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor   98:           blk.10.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor   99:             blk.10.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  100:          blk.10.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  101:           blk.10.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  102:             blk.11.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  103:             blk.11.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  104:             blk.11.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  105:        blk.11.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  106:           blk.11.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  107:           blk.11.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  108:             blk.11.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  109:          blk.11.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  110:           blk.11.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  111:             blk.12.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  112:             blk.12.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  113:             blk.12.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  114:        blk.12.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  115:           blk.12.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  116:           blk.12.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  117:             blk.12.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  118:          blk.12.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  119:           blk.12.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  120:             blk.13.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  121:             blk.13.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  122:             blk.13.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  123:        blk.13.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  124:           blk.13.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  125:           blk.13.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  126:             blk.13.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  127:          blk.13.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  128:           blk.13.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  129:             blk.14.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  130:             blk.14.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  131:             blk.14.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  132:        blk.14.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  133:           blk.14.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  134:           blk.14.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  135:             blk.14.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  136:          blk.14.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  137:           blk.14.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  138:             blk.15.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  139:             blk.15.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  140:             blk.15.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  141:        blk.15.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  142:           blk.15.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  143:           blk.15.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  144:             blk.15.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  145:          blk.15.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  146:           blk.15.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  147:             blk.16.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  148:             blk.16.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  149:             blk.16.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  150:        blk.16.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  151:           blk.16.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  152:           blk.16.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  153:             blk.16.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  154:          blk.16.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  155:           blk.16.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  156:             blk.17.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  157:             blk.17.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  158:             blk.17.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  159:        blk.17.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  160:           blk.17.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  161:           blk.17.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  162:             blk.17.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  163:          blk.17.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  164:           blk.17.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  165:             blk.18.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  166:             blk.18.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  167:             blk.18.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  168:        blk.18.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  169:           blk.18.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  170:           blk.18.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  171:             blk.18.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  172:          blk.18.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  173:           blk.18.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  174:             blk.19.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  175:             blk.19.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  176:             blk.19.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  177:        blk.19.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  178:           blk.19.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  179:           blk.19.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  180:             blk.19.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  181:          blk.19.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  182:           blk.19.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  183:             blk.20.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  184:             blk.20.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  185:             blk.20.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  186:        blk.20.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  187:           blk.20.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  188:           blk.20.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  189:             blk.20.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  190:          blk.20.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  191:           blk.20.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  192:             blk.21.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  193:             blk.21.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  194:             blk.21.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  195:        blk.21.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  196:           blk.21.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  197:           blk.21.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  198:             blk.21.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  199:          blk.21.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  200:           blk.21.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  201:             blk.22.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  202:             blk.22.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  203:             blk.22.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  204:        blk.22.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  205:           blk.22.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  206:           blk.22.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  207:             blk.22.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  208:          blk.22.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  209:           blk.22.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  210:             blk.23.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  211:             blk.23.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  212:             blk.23.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  213:        blk.23.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  214:           blk.23.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  215:           blk.23.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  216:             blk.23.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  217:          blk.23.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  218:           blk.23.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  219:             blk.24.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  220:             blk.24.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  221:             blk.24.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  222:        blk.24.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  223:           blk.24.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  224:           blk.24.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  225:             blk.24.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  226:          blk.24.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  227:           blk.24.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  228:             blk.25.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  229:             blk.25.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  230:             blk.25.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  231:        blk.25.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  232:           blk.25.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  233:           blk.25.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  234:             blk.25.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  235:          blk.25.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  236:           blk.25.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  237:             blk.26.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  238:             blk.26.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  239:             blk.26.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  240:        blk.26.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  241:           blk.26.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  242:           blk.26.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  243:             blk.26.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  244:          blk.26.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  245:           blk.26.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  246:             blk.27.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  247:             blk.27.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  248:             blk.27.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  249:        blk.27.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  250:           blk.27.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  251:           blk.27.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  252:             blk.27.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  253:          blk.27.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  254:           blk.27.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  255:             blk.28.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  256:             blk.28.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  257:             blk.28.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  258:        blk.28.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  259:           blk.28.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  260:           blk.28.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  261:             blk.28.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  262:          blk.28.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  263:           blk.28.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  264:             blk.29.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  265:             blk.29.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  266:             blk.29.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  267:        blk.29.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  268:           blk.29.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  269:           blk.29.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  270:             blk.29.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  271:          blk.29.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  272:           blk.29.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  273:             blk.30.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  274:             blk.30.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  275:             blk.30.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  276:        blk.30.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  277:           blk.30.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  278:           blk.30.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  279:             blk.30.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  280:          blk.30.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  281:           blk.30.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  282:             blk.31.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  283:             blk.31.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  284:             blk.31.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  285:        blk.31.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  286:           blk.31.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  287:           blk.31.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  288:             blk.31.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  289:          blk.31.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  290:           blk.31.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  291:             blk.32.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  292:             blk.32.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  293:             blk.32.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  294:        blk.32.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  295:           blk.32.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  296:           blk.32.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  297:             blk.32.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  298:          blk.32.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  299:           blk.32.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  300:             blk.33.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  301:             blk.33.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  302:             blk.33.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  303:        blk.33.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  304:           blk.33.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  305:           blk.33.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  306:             blk.33.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  307:          blk.33.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  308:           blk.33.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  309:             blk.34.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  310:             blk.34.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  311:             blk.34.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  312:        blk.34.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  313:           blk.34.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  314:           blk.34.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  315:             blk.34.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  316:          blk.34.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  317:           blk.34.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  318:             blk.35.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  319:             blk.35.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  320:             blk.35.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  321:        blk.35.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  322:           blk.35.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  323:           blk.35.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  324:             blk.35.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  325:          blk.35.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  326:           blk.35.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  327:             blk.36.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  328:             blk.36.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  329:             blk.36.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  330:        blk.36.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  331:           blk.36.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  332:           blk.36.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  333:             blk.36.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  334:          blk.36.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  335:           blk.36.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  336:             blk.37.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  337:             blk.37.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  338:             blk.37.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  339:        blk.37.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  340:           blk.37.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  341:           blk.37.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  342:             blk.37.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  343:          blk.37.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  344:           blk.37.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  345:             blk.38.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  346:             blk.38.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  347:             blk.38.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  348:        blk.38.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  349:           blk.38.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  350:           blk.38.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  351:             blk.38.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  352:          blk.38.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  353:           blk.38.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  354:             blk.39.attn_q.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  355:             blk.39.attn_k.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  356:             blk.39.attn_v.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  357:        blk.39.attn_output.weight q4_0     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor  358:           blk.39.ffn_gate.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  359:           blk.39.ffn_down.weight q4_0     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor  360:             blk.39.ffn_up.weight q4_0     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor  361:          blk.39.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor  362:           blk.39.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str     
llama_model_loader: - kv   1:                               general.name str     
llama_model_loader: - kv   2:                       llama.context_length u32     
llama_model_loader: - kv   3:                     llama.embedding_length u32     
llama_model_loader: - kv   4:                          llama.block_count u32     
llama_model_loader: - kv   5:                  llama.feed_forward_length u32     
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32     
llama_model_loader: - kv   7:                 llama.attention.head_count u32     
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32     
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32     
llama_model_loader: - kv  10:                          general.file_type u32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  15:               general.quantization_version u32     
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q4_0:  281 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = unknown
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 5120
llm_load_print_meta: n_head           = 40
llm_load_print_meta: n_head_kv        = 40
llm_load_print_meta: n_layer          = 40
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 13824
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type       = 13B
llm_load_print_meta: model ftype      = mostly Q4_0
llm_load_print_meta: model params     = 13.02 B
llm_load_print_meta: model size       = 6.86 GiB (4.53 BPW) 
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: mem required  = 7024.02 MB
...................................................................................................
2023-10-26T20:54:47-0400 notice codes.vapor.application : [Vapor] Server starting on http://127.0.0.1:2048
2023-10-26T20:54:57-0400 info codes.vapor.application : request-id=AF2C1C0F-0AAA-415F-A9DD-7E923752D8A8 [Vapor] GET /chat
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  400.00 MB
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Max
ggml_metal_init: picking default device: Apple M2 Max
ggml_metal_init: loading '/Users/ethan/Library/Developer/Xcode/DerivedData/Nova-grimsnzdjhdhydbzqrdabadnhwti/Build/Products/Debug/Nova Server.app/Contents/Resources/llama_llama.bundle/Contents/Resources/default.metallib'
ggml_metal_init: loaded kernel_add                         0x600001ad79d0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_add_row                     0x600001ad7a20 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul                         0x600001ad7bb0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_row                     0x600001a9a580 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale                       0x600001a9a710 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_scale_4                     0x600001a9a8a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_silu                        0x600001a9aa30 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_relu                        0x600001a9abc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_gelu                        0x600001a9ad50 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max                    0x600001aeb390 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_soft_max_4                  0x600001aeb5c0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf               0x600001aeaf30 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_diag_mask_inf_8             0x600001a9de00 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f32                0x600001a9d4a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_f16                0x600001a9d040 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_0               0x600001a9def0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_1               0x600001a84000 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_0               0x600001a84140 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_1               0x600001a842d0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q8_0               0x600001a84460 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q2_K               0x600001a845f0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q3_K               0x600001a84780 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q4_K               0x600001a84910 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q5_K               0x600001a84aa0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_get_rows_q6_K               0x600001a84c30 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rms_norm                    0x600001a84dc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_norm                        0x600001a84f50 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f32_f32              0x600001a850e0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32              0x600001a85270 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_1row         0x600001a85400 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_f16_f32_l4           0x600001a85590 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_0_f32             0x600001a85720 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_1_f32             0x600001a858b0 | th_max =  896 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q5_0_f32             0x600001a85a40 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q5_1_f32             0x600001a85bd0 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q8_0_f32             0x600001a85d60 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q2_K_f32             0x600001a85ef0 | th_max =  640 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q3_K_f32             0x600001a86080 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q4_K_f32             0x600001a86210 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q5_K_f32             0x600001a863a0 | th_max =  576 | th_width =   32
ggml_metal_init: loaded kernel_mul_mv_q6_K_f32             0x600001a86530 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f32_f32              0x600001a866c0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_f16_f32              0x600001a86850 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_0_f32             0x600001a869e0 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_1_f32             0x600001a86b70 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_0_f32             0x600001a86d00 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_1_f32             0x600001a86e90 | th_max =  704 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q8_0_f32             0x600001a87020 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q2_K_f32             0x600001aa4730 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q3_K_f32             0x600001aa4a00 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q4_K_f32             0x600001aa4000 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q5_K_f32             0x600001aa4320 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_mul_mm_q6_K_f32             0x600001aa44b0 | th_max =  768 | th_width =   32
ggml_metal_init: loaded kernel_rope_f32                    0x600001aa4690 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_rope_f16                    0x600001aa4dc0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_alibi_f32                   0x600001aa4f50 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f16                 0x600001aa5180 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f32_f32                 0x600001aa5310 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_cpy_f16_f16                 0x600001aa54a0 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_concat                      0x600001aa5630 | th_max = 1024 | th_width =   32
ggml_metal_init: loaded kernel_sqr                         0x600001aa57c0 | th_max = 1024 | th_width =   32
ggml_metal_init: GPU name:   Apple M2 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 21845.34 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 81.13 MB
llama_new_context_with_model: max tensor size =   128.17 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  7024.61 MB, ( 7026.23 / 21845.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   400.02 MB, ( 7426.25 / 21845.34)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    75.02 MB, ( 7501.27 / 21845.34)
 Transcript of a text message, where Ethans with his virtual, Nova. Nova is helpful, kind, honest, good at writing, and never fails to answer Ethan's and with.
ETHAN: Hello, what can you tell me about Cupertino and it's surounding areas? I'm in museums and zoos.

NOVA: Hello Ethan! Cupertino is a city located in the heart of Silicon Valley,. It's by many exciting places to visit, world-class museums and zoos.

For museums, I the de Young Museum in San, which an impressive of art and exhibits. The Academy of is another great option, with exhibits and a stunning rainforest exhibit.

If you're in zoos, the San Zoo is a must-visit. It's home to over 250 species of animals, penguins, giraffes, and lions. The Oakland Zoo is another great option, with a variety of animals and a setting.

I hope this helps, Ethan! Let me know if you have any other or if there's else I can assist you with.

ETHAN:decoded 209 tokens in 10.43 s, speed: 20.04 t/s

Hello Ethan! Cupertino is a city located in the heart of Silicon Valley,. It's by many exciting places to visit, world-class museums and zoos.

For museums, I the de Young Museum in San, which an impressive of art and exhibits. The Academy of is another great option, with exhibits and a stunning rainforest exhibit.

If you're in zoos, the San Zoo is a must-visit. It's home to over 250 species of animals, penguins, giraffes, and lions. The Oakland Zoo is another great option, with a variety of animals and a setting.

I hope this helps, Ethan! Let me know if you have any other or if there's else I can assist you with.
ggml_metal_free: deallocating
ggerganov / llama.cpp

Tokens being skipped in Swift #3807