Result of merging 2 Gemma2 9B models gains 1B parameters somehow

jim-plus commented 4 months ago

Resulting model weights and SLERP merge formula here: https://huggingface.co/grimjim/Gemma2-Nephilim-v3-9B

An exl2 quant of the above works, but where did the extra 1B parameters come from?

ALucek commented 3 months ago

https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german

Also found this to happen with model stock and gemma2 2b

jim-plus commented 3 months ago

In the case of 9b, the fault appears to reside in the first safetensors chunk. There's a spurious lm_head.weight tensor that should be removed from that as well as model.safetensors.index.json, and after that the model size is what it should be.

ALucek commented 3 months ago

Beat me to it, same thing is happening here with lm_head.weight for the 2b model

looks like its likely something related to handling the tokenizer source

h-lunah commented 3 months ago

and how can the duplicate lm_head.weight be removed so I can merge uncensored models for max uncensorship?

ALucek commented 3 months ago

@piotr25691 Remove the entry for it from your index.json using whatever code editor, and then for the model itself you can directly edit the file with safetensors package. Here's a simplified script that will do it for you

from safetensors import safe_open
from safetensors.torch import save_file
import torch

# Path to your SafeTensors file
input_file = "path/to/your/model-00001-of-00002.safetensors"
output_file = "path/to/your/fixed-model-00001-of-00002.safetensors"

# Load the SafeTensors file
tensors = {}
with safe_open(input_file, framework="pt", device="cpu") as f:
    for key in f.keys():
        if key != "lm_head.weight":
            tensors[key] = f.get_tensor(key)

# Save the modified tensors
save_file(tensors, output_file)

print(f"SafeTensors file without lm_head saved to {output_file}")

# Optionally, verify the removal
with safe_open(output_file, framework="pt", device="cpu") as f:
    if "lm_head.weight" not in f.keys():
        print("lm_head.weight successfully removed")
    else:
        print("Warning: lm_head.weight still present")

jukofyork commented 3 months ago

It's because the (transpose of?) lm_head is used as embedding weights too:

https://github.com/ggerganov/llama.cpp/issues/9065

IIRC, the command-r models also reuses the lm_head like this too.

arcee-ai / mergekit

Result of merging 2 Gemma2 9B models gains 1B parameters somehow #385