Open AshD opened 2 months ago
This is an expected failure. Miqu and Mistral Large have different hidden state sizes so their layers can't be used interchangeably. In general models need to be of the same architecture and family to produce a valid result.
I want to merge Mistral Large with https://huggingface.co/softwareweaver/Twilight-Miqu-146B by adding some layers from Twilight Miqu to Mistral Large using the passthrough method. Is there a better way to do this?
The merge succeeds when using --allow-crimes but the GGUF model fails to run and so does loading it with transformers GGUF Runtime error:
RuntimeError: shape '[96, 2, 42, 8192]' is invalid for input of size 67108864
Transformers loading error:
size mismatch for model.layers.151.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 12288]) from checkpoint, the shape in current model is torch.Size([28672, 8192]
Merge config: