Closed jim-plus closed 3 months ago
https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german
Also found this to happen with model stock and gemma2 2b
In the case of 9b, the fault appears to reside in the first safetensors chunk. There's a spurious lm_head.weight tensor that should be removed from that as well as model.safetensors.index.json, and after that the model size is what it should be.
Beat me to it, same thing is happening here with lm_head.weight for the 2b model
looks like its likely something related to handling the tokenizer source
and how can the duplicate lm_head.weight
be removed so I can merge uncensored models for max uncensorship?
@piotr25691 Remove the entry for it from your index.json using whatever code editor, and then for the model itself you can directly edit the file with safetensors package. Here's a simplified script that will do it for you
from safetensors import safe_open
from safetensors.torch import save_file
import torch
# Path to your SafeTensors file
input_file = "path/to/your/model-00001-of-00002.safetensors"
output_file = "path/to/your/fixed-model-00001-of-00002.safetensors"
# Load the SafeTensors file
tensors = {}
with safe_open(input_file, framework="pt", device="cpu") as f:
for key in f.keys():
if key != "lm_head.weight":
tensors[key] = f.get_tensor(key)
# Save the modified tensors
save_file(tensors, output_file)
print(f"SafeTensors file without lm_head saved to {output_file}")
# Optionally, verify the removal
with safe_open(output_file, framework="pt", device="cpu") as f:
if "lm_head.weight" not in f.keys():
print("lm_head.weight successfully removed")
else:
print("Warning: lm_head.weight still present")
It's because the (transpose of?) lm_head
is used as embedding weights too:
https://github.com/ggerganov/llama.cpp/issues/9065
IIRC, the command-r
models also reuses the lm_head
like this too.
Resulting model weights and SLERP merge formula here: https://huggingface.co/grimjim/Gemma2-Nephilim-v3-9B
An exl2 quant of the above works, but where did the extra 1B parameters come from?