ggerganov / llama.cpp

LLM inference in C/C++
MIT License
66.15k stars 9.5k forks source link

[User] “'token_embd.weight' has wrong shape” when loading WizardLM-Uncensored-Falcon-40b #2894

Closed rlanday closed 6 months ago

rlanday commented 1 year ago

Expected Behavior

I have been trying to run my favorite model, WizardLM-Uncensored-Falcon-40b, in llama.cpp, now that it has Falcon support (I have been running it in ggllm.cpp). I expected that, being a derivative of a standard Falcon model, this model should now work in llama.cpp.

Link to the model: https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b

Current Behavior

Please provide a detailed written description of what llama.cpp did, instead.

I have tried multiple times (on different revisions) to convert the model to gguf format using the the latest code available:

python convert-falcon-hf-to-gguf.py /Volumes/Storage/ML\ models/WizardLM-Uncensored-Falcon-40b/ 1

This script runs successfully. However, every time I try to run the resulting model (or a quantized version thereof), I get this error:

error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 8192, 65024, got 8192, 65025, 1, 1

Apparently there is one extra token (padding?) in the embedding table that llama.cpp is not expecting.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

rlanday@Ryans-MBP-2 llama.cpp % sysctl -a | grep machdep.cpu machdep.cpu.mwait.linesize_min: 64 machdep.cpu.mwait.linesize_max: 64 machdep.cpu.mwait.extensions: 3 machdep.cpu.mwait.sub_Cstates: 286531872 machdep.cpu.thermal.sensor: 1 machdep.cpu.thermal.dynamic_acceleration: 1 machdep.cpu.thermal.invariant_APIC_timer: 1 machdep.cpu.thermal.thresholds: 2 machdep.cpu.thermal.ACNT_MCNT: 1 machdep.cpu.thermal.core_power_limits: 1 machdep.cpu.thermal.fine_grain_clock_mod: 1 machdep.cpu.thermal.package_thermal_intr: 1 machdep.cpu.thermal.hardware_feedback: 0 machdep.cpu.thermal.energy_policy: 1 machdep.cpu.xsave.extended_state: 31 832 1088 0 machdep.cpu.xsave.extended_state1: 15 832 256 0 machdep.cpu.arch_perf.version: 4 machdep.cpu.arch_perf.number: 4 machdep.cpu.arch_perf.width: 48 machdep.cpu.arch_perf.events_number: 7 machdep.cpu.arch_perf.events: 0 machdep.cpu.arch_perf.fixed_number: 3 machdep.cpu.arch_perf.fixed_width: 48 machdep.cpu.cache.linesize: 64 machdep.cpu.cache.L2_associativity: 4 machdep.cpu.cache.size: 256 machdep.cpu.tlb.inst.large: 8 machdep.cpu.tlb.data.small: 64 machdep.cpu.tlb.data.small_level1: 64 machdep.cpu.address_bits.physical: 39 machdep.cpu.address_bits.virtual: 48 machdep.cpu.tsc_ccc.numerator: 192 machdep.cpu.tsc_ccc.denominator: 2 machdep.cpu.max_basic: 22 machdep.cpu.max_ext: 2147483656 machdep.cpu.vendor: GenuineIntel machdep.cpu.brand_string: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz machdep.cpu.family: 6 machdep.cpu.model: 158 machdep.cpu.extmodel: 9 machdep.cpu.extfamily: 0 machdep.cpu.stepping: 13 machdep.cpu.feature_bits: 9221960262849657855 machdep.cpu.leaf7_feature_bits: 43804591 1073741824 machdep.cpu.leaf7_feature_bits_edx: 3154120192 machdep.cpu.extfeature_bits: 1241984796928 machdep.cpu.signature: 591597 machdep.cpu.brand: 0 machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT SGXLC MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI machdep.cpu.logical_per_package: 16 machdep.cpu.cores_per_package: 8 machdep.cpu.microcode_version: 248 machdep.cpu.processor_flag: 5 machdep.cpu.core_count: 8 machdep.cpu.thread_count: 16

rlanday@Ryans-MBP-2 llama.cpp % uname -a Darwin Ryans-MacBook-Pro-2.local 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:22 PDT 2023; root:xnu-8796.121.3~7/RELEASE_X86_64 x86_64

rlanday@Ryans-MBP-2 llama.cpp % python3 --version
Python 3.11.4

rlanday@Ryans-MBP-2 llama.cpp % make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0

rlanday@Ryans-MBP-2 llama.cpp % g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: x86_64-apple-darwin22.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. Download the model at https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b
  2. Convert it to gguf format using python convert-falcon-hf-to-gguf.py 1
  3. Attempt to run inference using the model, e.g.
    
    ./main --ctx_size 16384 -m <path to model>  --top_p 0 --top_k 40 --temp 0.7 --repeat_penalty 1.176470588235294 -t 8 -n -1 --repeat_last_n 256 -p "Please come up with a plan to fix San Francisco.

"


# Failure Logs

rlanday@Ryans-MBP-2 llama.cpp % ./main --ctx_size 16384 -m /Volumes/Storage/ML\ models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf --top_p 0 --top_k 40 --temp 0.7 --repeat_penalty 1.176470588235294 -t 8 -n -1 --repeat_last_n 256 -p "Please come up with a plan to fix San Francisco.

lscpu" main: warning: base model only supports context sizes no greater than 2048 tokens (16384 specified) main: build = 1119 (06abf8e) main: seed = 1693374095 llama_model_loader: loaded meta data with 18 key-value pairs and 484 tensors from /Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf (version GGUF V1 (support until nov 2023)) llama_model_loader: - tensor 0: token_embd.weight f16 [ 8192, 65025, 1, 1 ] llama_model_loader: - tensor 1: blk.0.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 2: blk.0.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 3: blk.0.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 4: blk.0.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 5: blk.0.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 6: blk.0.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 7: blk.0.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 8: blk.0.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 9: blk.1.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 10: blk.1.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 11: blk.1.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 12: blk.1.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 13: blk.1.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 14: blk.1.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 15: blk.1.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 16: blk.1.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 17: blk.2.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 18: blk.2.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 19: blk.2.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 20: blk.2.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 21: blk.2.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 22: blk.2.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 23: blk.2.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 24: blk.2.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 25: blk.3.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 26: blk.3.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 27: blk.3.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 28: blk.3.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 29: blk.3.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 30: blk.3.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 31: blk.3.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 32: blk.3.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 33: blk.4.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 34: blk.4.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 35: blk.4.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 36: blk.4.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 37: blk.4.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 38: blk.4.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 39: blk.4.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 40: blk.4.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 41: blk.5.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 42: blk.5.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 43: blk.5.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 44: blk.5.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 45: blk.5.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 46: blk.5.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 47: blk.5.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 48: blk.5.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 49: blk.6.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 50: blk.6.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 51: blk.6.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 52: blk.6.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 53: blk.6.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 54: blk.6.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 55: blk.6.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 56: blk.6.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 57: blk.7.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 58: blk.7.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 59: blk.7.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 60: blk.7.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 61: blk.7.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 62: blk.7.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 63: blk.7.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 64: blk.7.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 65: blk.8.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 66: blk.8.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 67: blk.8.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 68: blk.8.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 69: blk.8.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 70: blk.8.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 71: blk.8.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 72: blk.8.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 73: blk.9.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 74: blk.9.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 75: blk.9.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 76: blk.9.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 77: blk.9.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 78: blk.9.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 79: blk.9.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 80: blk.9.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 81: blk.10.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 82: blk.10.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 83: blk.10.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 84: blk.10.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 85: blk.10.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 86: blk.10.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 87: blk.10.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 88: blk.10.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 89: blk.11.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 90: blk.11.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 91: blk.11.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 92: blk.11.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 93: blk.11.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 94: blk.11.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 95: blk.11.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 96: blk.11.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 97: blk.12.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 98: blk.12.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 99: blk.12.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 100: blk.12.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 101: blk.12.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 102: blk.12.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 103: blk.12.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 104: blk.12.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 105: blk.13.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 106: blk.13.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 107: blk.13.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 108: blk.13.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 109: blk.13.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 110: blk.13.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 111: blk.13.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 112: blk.13.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 113: blk.14.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 114: blk.14.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 115: blk.14.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 116: blk.14.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 117: blk.14.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 118: blk.14.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 119: blk.14.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 120: blk.14.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 121: blk.15.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 122: blk.15.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 123: blk.15.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 124: blk.15.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 125: blk.15.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 126: blk.15.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 127: blk.15.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 128: blk.15.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 129: blk.16.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 130: blk.16.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 131: blk.16.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 132: blk.16.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 133: blk.16.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 134: blk.16.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 135: blk.16.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 136: blk.16.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 137: blk.17.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 138: blk.17.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 139: blk.17.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 140: blk.17.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 141: blk.17.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 142: blk.17.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 143: blk.17.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 144: blk.17.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 145: blk.18.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 146: blk.18.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 147: blk.18.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 148: blk.18.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 149: blk.18.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 150: blk.18.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 151: blk.18.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 152: blk.18.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 153: blk.19.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 154: blk.19.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 155: blk.19.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 156: blk.19.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 157: blk.19.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 158: blk.19.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 159: blk.19.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 160: blk.19.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 161: blk.20.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 162: blk.20.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 163: blk.20.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 164: blk.20.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 165: blk.20.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 166: blk.20.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 167: blk.20.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 168: blk.20.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 169: blk.21.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 170: blk.21.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 171: blk.21.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 172: blk.21.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 173: blk.21.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 174: blk.21.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 175: blk.21.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 176: blk.21.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 177: blk.22.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 178: blk.22.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 179: blk.22.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 180: blk.22.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 181: blk.22.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 182: blk.22.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 183: blk.22.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 184: blk.22.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 185: blk.23.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 186: blk.23.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 187: blk.23.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 188: blk.23.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 189: blk.23.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 190: blk.23.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 191: blk.23.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 192: blk.23.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 193: blk.24.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 194: blk.24.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 195: blk.24.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 196: blk.24.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 197: blk.24.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 198: blk.24.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 199: blk.24.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 200: blk.24.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 201: blk.25.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 202: blk.25.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 203: blk.25.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 204: blk.25.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 205: blk.25.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 206: blk.25.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 207: blk.25.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 208: blk.25.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 209: blk.26.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 210: blk.26.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 211: blk.26.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 212: blk.26.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 213: blk.26.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 214: blk.26.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 215: blk.26.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 216: blk.26.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 217: blk.27.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 218: blk.27.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 219: blk.27.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 220: blk.27.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 221: blk.27.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 222: blk.27.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 223: blk.27.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 224: blk.27.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 225: blk.28.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 226: blk.28.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 227: blk.28.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 228: blk.28.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 229: blk.28.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 230: blk.28.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 231: blk.28.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 232: blk.28.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 233: blk.29.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 234: blk.29.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 235: blk.29.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 236: blk.29.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 237: blk.29.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 238: blk.29.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 239: blk.29.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 240: blk.29.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 241: blk.30.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 242: blk.30.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 243: blk.30.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 244: blk.30.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 245: blk.30.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 246: blk.30.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 247: blk.30.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 248: blk.30.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 249: blk.31.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 250: blk.31.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 251: blk.31.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 252: blk.31.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 253: blk.31.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 254: blk.31.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 255: blk.31.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 256: blk.31.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 257: blk.32.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 258: blk.32.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 259: blk.32.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 260: blk.32.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 261: blk.32.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 262: blk.32.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 263: blk.32.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 264: blk.32.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 265: blk.33.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 266: blk.33.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 267: blk.33.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 268: blk.33.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 269: blk.33.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 270: blk.33.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 271: blk.33.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 272: blk.33.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 273: blk.34.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 274: blk.34.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 275: blk.34.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 276: blk.34.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 277: blk.34.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 278: blk.34.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 279: blk.34.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 280: blk.34.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 281: blk.35.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 282: blk.35.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 283: blk.35.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 284: blk.35.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 285: blk.35.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 286: blk.35.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 287: blk.35.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 288: blk.35.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 289: blk.36.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 290: blk.36.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 291: blk.36.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 292: blk.36.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 293: blk.36.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 294: blk.36.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 295: blk.36.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 296: blk.36.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 297: blk.37.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 298: blk.37.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 299: blk.37.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 300: blk.37.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 301: blk.37.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 302: blk.37.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 303: blk.37.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 304: blk.37.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 305: blk.38.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 306: blk.38.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 307: blk.38.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 308: blk.38.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 309: blk.38.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 310: blk.38.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 311: blk.38.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 312: blk.38.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 313: blk.39.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 314: blk.39.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 315: blk.39.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 316: blk.39.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 317: blk.39.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 318: blk.39.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 319: blk.39.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 320: blk.39.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 321: blk.40.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 322: blk.40.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 323: blk.40.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 324: blk.40.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 325: blk.40.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 326: blk.40.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 327: blk.40.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 328: blk.40.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 329: blk.41.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 330: blk.41.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 331: blk.41.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 332: blk.41.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 333: blk.41.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 334: blk.41.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 335: blk.41.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 336: blk.41.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 337: blk.42.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 338: blk.42.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 339: blk.42.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 340: blk.42.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 341: blk.42.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 342: blk.42.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 343: blk.42.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 344: blk.42.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 345: blk.43.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 346: blk.43.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 347: blk.43.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 348: blk.43.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 349: blk.43.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 350: blk.43.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 351: blk.43.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 352: blk.43.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 353: blk.44.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 354: blk.44.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 355: blk.44.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 356: blk.44.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 357: blk.44.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 358: blk.44.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 359: blk.44.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 360: blk.44.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 361: blk.45.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 362: blk.45.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 363: blk.45.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 364: blk.45.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 365: blk.45.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 366: blk.45.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 367: blk.45.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 368: blk.45.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 369: blk.46.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 370: blk.46.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 371: blk.46.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 372: blk.46.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 373: blk.46.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 374: blk.46.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 375: blk.46.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 376: blk.46.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 377: blk.47.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 378: blk.47.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 379: blk.47.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 380: blk.47.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 381: blk.47.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 382: blk.47.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 383: blk.47.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 384: blk.47.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 385: blk.48.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 386: blk.48.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 387: blk.48.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 388: blk.48.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 389: blk.48.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 390: blk.48.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 391: blk.48.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 392: blk.48.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 393: blk.49.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 394: blk.49.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 395: blk.49.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 396: blk.49.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 397: blk.49.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 398: blk.49.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 399: blk.49.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 400: blk.49.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 401: blk.50.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 402: blk.50.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 403: blk.50.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 404: blk.50.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 405: blk.50.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 406: blk.50.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 407: blk.50.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 408: blk.50.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 409: blk.51.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 410: blk.51.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 411: blk.51.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 412: blk.51.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 413: blk.51.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 414: blk.51.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 415: blk.51.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 416: blk.51.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 417: blk.52.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 418: blk.52.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 419: blk.52.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 420: blk.52.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 421: blk.52.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 422: blk.52.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 423: blk.52.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 424: blk.52.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 425: blk.53.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 426: blk.53.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 427: blk.53.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 428: blk.53.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 429: blk.53.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 430: blk.53.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 431: blk.53.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 432: blk.53.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 433: blk.54.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 434: blk.54.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 435: blk.54.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 436: blk.54.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 437: blk.54.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 438: blk.54.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 439: blk.54.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 440: blk.54.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 441: blk.55.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 442: blk.55.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 443: blk.55.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 444: blk.55.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 445: blk.55.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 446: blk.55.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 447: blk.55.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 448: blk.55.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 449: blk.56.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 450: blk.56.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 451: blk.56.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 452: blk.56.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 453: blk.56.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 454: blk.56.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 455: blk.56.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 456: blk.56.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 457: blk.57.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 458: blk.57.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 459: blk.57.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 460: blk.57.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 461: blk.57.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 462: blk.57.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 463: blk.57.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 464: blk.57.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 465: blk.58.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 466: blk.58.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 467: blk.58.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 468: blk.58.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 469: blk.58.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 470: blk.58.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 471: blk.58.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 472: blk.58.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 473: blk.59.attn_norm_2.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 474: blk.59.attn_norm_2.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 475: blk.59.attn_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 476: blk.59.attn_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 477: blk.59.attn_qkv.weight f16 [ 8192, 9216, 1, 1 ] llama_model_loader: - tensor 478: blk.59.attn_output.weight f16 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 479: blk.59.ffn_up.weight f16 [ 8192, 32768, 1, 1 ] llama_model_loader: - tensor 480: blk.59.ffn_down.weight f16 [ 32768, 8192, 1, 1 ] llama_model_loader: - tensor 481: output_norm.weight f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 482: output_norm.bias f32 [ 8192, 1, 1, 1 ] llama_model_loader: - tensor 483: output.weight f16 [ 8192, 65025, 1, 1 ] llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: falcon.context_length u32
llama_model_loader: - kv 3: falcon.tensor_data_layout str
llama_model_loader: - kv 4: falcon.embedding_length u32
llama_model_loader: - kv 5: falcon.feed_forward_length u32
llama_model_loader: - kv 6: falcon.block_count u32
llama_model_loader: - kv 7: falcon.attention.head_count u32
llama_model_loader: - kv 8: falcon.attention.head_count_kv u32
llama_model_loader: - kv 9: falcon.attention.layer_norm_epsilon f32
llama_model_loader: - kv 10: general.file_type u32
llama_model_loader: - kv 11: tokenizer.ggml.model str
llama_model_loader: - kv 12: tokenizer.ggml.merges arr
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr
llama_model_loader: - kv 14: tokenizer.ggml.scores arr
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32
llama_model_loader: - type f32: 242 tensors llama_model_loader: - type f16: 242 tensors llm_load_print_meta: format = GGUF V1 (support until nov 2023) llm_load_print_meta: arch = falcon llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 65024 llm_load_print_meta: n_merges = 64784 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_ctx = 16384 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_head = 128 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 60 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_gqa = 16 llm_load_print_meta: f_norm_eps = 1.0e-05 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: n_ff = 32768 llm_load_print_meta: freq_base = 10000.0 llm_load_print_meta: freq_scale = 1 llm_load_print_meta: model type = 40B llm_load_print_meta: model ftype = mostly F16 llm_load_print_meta: model size = 41.84 B llm_load_print_meta: general.name = Falcon llm_load_print_meta: BOS token = 1 '>>ABSTRACT<<' llm_load_print_meta: EOS token = 2 '>>INTRODUCTION<<' llm_load_print_meta: LF token = 193 ' ' llm_load_tensors: ggml ctx size = 0.16 MB error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 8192, 65024, got 8192, 65025, 1, 1 llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model '/Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf' main: error: unable to load model


Additional environment info:

rlanday@Ryans-MBP-2 llama.cpp % git log | head -1

commit 06abf8eebabe086ca4003dee2754ab45032cd3fd

rlanday@Ryans-MBP-2 llama.cpp % pip list | egrep "torch|numpy|sentencepiece" numpy 1.24.0 sentencepiece 0.1.98 torch 2.0.1

KerfuffleV2 commented 1 year ago

Not related to your current issue but

llm_load_print_meta: BOS token = 1 '>>ABSTRACT<<'
llm_load_print_meta: EOS token = 2 '>>INTRODUCTION<<'

is definitely wrong. In the base model it should be token id 11 for both (<|endoftext|>).

akawrykow commented 1 year ago

If you change this line: https://github.com/ggerganov/llama.cpp/blob/b532a69b2fd08067f34f32f37a2fd9b37678a34a/convert-falcon-hf-to-gguf.py#L134

to:

vocab_size = hparams["vocab_size"]

then re-convert, does it work?

I have noticed in other Falcon models (e.g falcon-rw-1b) that the stated vocab size from the config doesn't match the number of tokens in tokenizer.json. The script already pads with arbitrary tokens -- in this case, only a single extra token is added.

akawrykow commented 1 year ago

@KerfuffleV2 FWIW, the config specifies token IDs 1 and 2: https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b/blob/main/config.json#L14-L15

which do map to '>>ABSTRACT<<' and '>>INTRODUCTION<<': https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b/raw/main/tokenizer.json

although there is an additional tokenizer_config.json which seems to override EOS: https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-40b/blob/main/tokenizer_config.json#L4 - I don't think we're accounting for this in the script. But I couldn't find anything similar for BOS

akawrykow commented 1 year ago

I also noticed that tokenizer_config.json specifies:

  "padding": {
    "strategy": "BatchLongest",
    "direction": "Right",
    "pad_to_multiple_of": null,
    "pad_id": 65024,
    "pad_type_id": 0,
    "pad_token": "[PAD]"
  },

this extra token would indeed account for the one missing token, and we don't account for this in the script, although our existing padding inserts [PAD0] so, close enough?

KerfuffleV2 commented 1 year ago

FWIW, the config specifies token IDs 1 and 2

although there is an additional tokenizer_config.json which seems to override EOS

Weird. For extracting special token IDs like BOS/EOS, tokenizer.json and tokenizer_config.json get tried first and config.json is tried as a fallback.

akawrykow commented 1 year ago

@KerfuffleV2 actually it seems like it should work now after https://github.com/ggerganov/llama.cpp/pull/2842.

It seems like previously, we just used the IDs from config.json but now we have the extra step of looking at the added_tokens of the tokenizer_config.json, and for the model in question, the eos token is there at least, so it should get mapped back to the correct type.

I wonder if you publish the updated module to pip, install that, then run the convert script if this would work

akawrykow commented 1 year ago

Yes, confirmed after upgrading the gguf package with an extra print:

gguf: Setting special token type eos to 11

@KerfuffleV2 @ggerganov does it make sense to fallback to BOS = EOS when we have a 'special' EOS token? Is that a convention that these models are following implicitly?

KerfuffleV2 commented 1 year ago

does it make sense to fallback to BOS = EOS when we have a 'special' EOS token?

Unfortunately, I don't know enough to answer that question. It sounds kind of reasonable, but it probably really depends on how the model is trained.

I wonder if you publish the updated module to pip

I don't have that capability (but I should have done a better job of making sure that happened in sync with my changes). Hopefully #2916 fixed your issue issues. Sorry about the breakage!

rlanday commented 1 year ago

I updated to the latest gguf and revision 92d0b751a77a089e650983e9f1564ef4d31b32b9 and verified that llama.cpp now produces this output when loading the converted model:

llm_load_print_meta: general.name   = Falcon
llm_load_print_meta: BOS token = 11 '<|endoftext|>'
llm_load_print_meta: EOS token = 11 '<|endoftext|>'
llm_load_print_meta: LF token  = 193 '
'
llm_load_tensors: ggml ctx size =    0.16 MB
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  8192, 65024, got  8192, 65025,     1,     1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf'
main: error: unable to load model

I then applied the change in https://github.com/ggerganov/llama.cpp/issues/2894#issuecomment-1699359722 and reconverted the model, and was able to get it working (the model loads and produces coherent output).

Philipp-Sc commented 1 year ago

I am having a very similar issue, but I use convert-llama-hf-to-gguf.py

llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32001
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 2048
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = mostly F16 (guessed)
llm_load_print_meta: model size     = 6.74 B
llm_load_print_meta: general.name   = merged_adapters_11300
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.09 MB
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  4096, 32001, got  4096, 32000,     1,     1

there is no obvious equivalent to vocab_size = hparams["vocab_size"] in convert-llama-hf-to-gguf.py

is there a fix required for convert-llama-hf-to-gguf.py? (If not it's probably a configuration mistake on my part)

KerfuffleV2 commented 1 year ago

is there a fix required for convert-llama-hf-to-gguf.py?

I think it just doesn't work currently. Try using the main convert.py script. There's a pull request to remove those convert-llama scripts since apparently they are non-functional.

akawrykow commented 1 year ago

I updated to the latest gguf and revision 92d0b75 and verified that llama.cpp now produces this output when loading the converted model:

llm_load_print_meta: general.name   = Falcon
llm_load_print_meta: BOS token = 11 '<|endoftext|>'
llm_load_print_meta: EOS token = 11 '<|endoftext|>'
llm_load_print_meta: LF token  = 193 '
'
llm_load_tensors: ggml ctx size =    0.16 MB
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  8192, 65024, got  8192, 65025,     1,     1
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Volumes/Storage/ML models/WizardLM-Uncensored-Falcon-40b/ggml-model-f16.gguf'
main: error: unable to load model

I then applied the change in #2894 (comment) and reconverted the model, and was able to get it working (the model loads and produces coherent output).

cc @ggerganov shall we merge https://github.com/ggerganov/llama.cpp/pull/2914 ?

Philipp-Sc commented 1 year ago

@KerfuffleV2 convert.py fails for another reason, any idea what this is about?

ubuntu@host:~/llama.cpp$
python3 convert.py ../merged_adapters_11300/
Traceback (most recent call last):
  File "convert.py", line 533, in <module>
    LazyModel = dict[str, LazyTensor]
TypeError: 'type' object is not subscriptable

ubuntu@host:~/llama.cpp$ ls ../merged_adapters_11300
added_tokens.json  generation_config.json            pytorch_model-00002-of-00002.bin  special_tokens_map.json  tokenizer.model
config.json        pytorch_model-00001-of-00002.bin  pytorch_model.bin.index.json      tokenizer.json           tokenizer_config.json

thanks in advance.

KerfuffleV2 commented 1 year ago

convert.py fails for another reason, any idea what this is about?

Uhhh, actually can't blame me for that one! Looks like @Cebtenzzre changed it from Dict to dict in #2916. If you add Dict to the list of imports from typing near the top and change that line to use Dict rather than dict does it work?

Or maybe simpler as a quick fix, I think you can just make it LazyModel = dict (typechecking tools won't like this but it shouldn't have a runtime impact).

Philipp-Sc commented 1 year ago

@KerfuffleV2 thanks for your quick reply I found the issue, I had to update Python to version 3.9

Everything works now :)

cebtenzzre commented 1 year ago

I think you can just make it LazyModel = dict

You can actually remove that line entirely if you just want it to run, it's only used by the type checker.

Fixed in PR #2949.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jingy2000 commented 5 months ago

I found a related discussion here that might be helpful:

https://huggingface.co/TheBloke/CodeLlama-7B-Python-GGUF/discussions/1