Open w601sxs opened 6 months ago
maybe an example of how to frankenmerge with passthrough?
Hey there, thank you for the detailed issue. This is definitely a bug
As of now for a quick quick fix for this to make it work on your end is to go here mergekit/_data/architectures/bert.json
and replace all instances of bert.
with an empty string
and that should hopefully get you going with your current config
That said, we will be putting a bug fix soon
I'll try that in a local branch and wait for the fix! thanks
Thanks for the bug report! PR #295 should fix this issue. If you run into any further trouble please let me know - the BERT support is quite fresh and I appreciate knowing where it fails.
Hi Charles! Thanks for the great work!
I am encountering similar issues.
I am using phi-1 and phi-1.5 models, the config yml file is as follows.
dtype: float16
merge_method: passthrough
slices:
- sources:
- layer_range: [0, 8]
model: microsoft/phi-1
- sources:
- layer_range: [4, 12]
model: microsoft/phi-1
- sources:
- layer_range: [8, 16]
model: microsoft/phi-1
- sources:
- layer_range: [12, 20]
model: microsoft/phi-1
- sources:
- layer_range: [16, 24]
model: microsoft/phi-1
- sources:
- layer_range: [20, 28]
model: microsoft/phi-1
- sources:
- layer_range: [24, 32]
model: microsoft/phi-1
Both phi-1 and phi-1.5 give me the following feedback. (I also tried Tinyllama, it also gave me the same issue)
RuntimeError: Tensor model.layers.31.mlp.fc2.weight required but not present in model microsoft/phi-1_5
In addition, how can I run the same yml config for phi-3 model, whose architecture is currently not included in the package?
Thanks! @cg123
@yaof20 This is because microsoft/phi-1
only has 24 layers, but you're telling mergekit to look for 32 total. If you adjust your config to only use 0-24 instead it should work properly.
As for Phi-3 - I'll add support for it in the next couple of days!
@yaof20 This is because
microsoft/phi-1
only has 24 layers, but you're telling mergekit to look for 32 total. If you adjust your config to only use 0-24 instead it should work properly.As for Phi-3 - I'll add support for it in the next couple of days!
Thanks for the reply!
I'm trying to merge some embedding models with this config file. the architectures are similar but I think it is erroring out on some names of layers? Would love some suggestions on how to change the yaml to make it work.
YAML config:
Error
CLI used