Open scarydemon2 opened 10 months ago
It looks like BloomForCausalLM has a similar situation to many GPT-family models where model checkpoints can, but do not always, have a prefix ('transformer.
) on tensor names. For example, see projecte-aina/FLOR-6.3B and bigscience/bloom-7b1 - FLOR has the embedding named transformer.word_embeddings.weight
and base bloom has it named just word_embeddings.weight
.
I am working on a more general fix for this sort of situation but right now mergekit expects an architecture to have one consistent naming format. In the meanwhile, I'd try just loading and re-serializing the models like so:
import transformers, torch
model = transformers.AutoModelForCausalLM.from_pretrained("your/model", dtype=torch.float16)
model.save_pretrained("/workspace/model_reexport", safe_serialization=True)
If you do this to each model they should all be written with consistent names. I know this is hacky and inconvenient - sorry about that!
Hi @scarydemon2 May I ask if you managed to solve this. I am trying to do something similar. Thanks!
I first add BLOOM_INFO in architecture.py :
And after I run :
But there is an error occured: Traceback (most recent call last): File "/opt/conda/bin/mergekit-yaml", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(args, *kwargs)
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/options.py", line 59, in wrapper
f(args, **kwargs)
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
run_merge(
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/merge.py", line 110, in run_merge
exec.run(
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 261, in run
for ref, tensor in tqdm.tqdm(self.generate_tensors(), total=len(self.targets)):
File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 280, in generate_tensors
schedule = self._schedule_ops()
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 376, in _schedule_ops
dependencies, ops = self._build_dependencies()
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 416, in _build_dependencies
_visit(target)
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 413, in _visit
_visit(dependency)
File "/home/gaotianhao1/alpaca-lora/mergekit/mergekit/graph.py", line 405, in _visit
raise RuntimeError(f"No rule to produce {node}")
No rule to produce path1:word_embeddings.weight
So I wanna know the right process to add a BLOOM architecture. Expecting for your help.