RuntimeError During Merging Process Possibly Due to Shared Memory Tensors

Description: I'm encountering an error while trying to merge models using the merge.py script. The process loads the models and processes the layers correctly, but when it attempts to save the merged model, a RuntimeError is raised due to tensors sharing memory. Here's the detailed log:

This issue regarding when running following command on notebook

!python dam/merge.py \
  "cerebras/Cerebras-GPT-111M" \
  "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/Cerebras-GPT-111M-DAM-test-untrained-merge"

Log Output:

Loading base model: cerebras/Cerebras-GPT-111M
Loading models to merge:
Loading models: 100% 2/2 [00:01<00:00,  1.25it/s]
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Processing layer norms: 0it [00:00, ?it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 4029.11it/s]
Processing linear layers: 100% 1/1 [00:00<00:00,  3.68it/s]
Total number of parameters: 226845696
Total number of trainable parameters: 72454656
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.wte.embeddings.0', 'lm_head.weights.0'}, {'transformer.wte.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Reproduction Steps:

Run merge.py script with the following parameters:

!python dam/merge.py \
  "cerebras/Cerebras-GPT-111M" \
  "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/DAM-test-untrained-merge"

The error occurs during the save_pretrained() call when the merged model is being saved.

Expected Behavior: The merged model should save correctly without errors.

Actual Behavior: The process fails during the save step due to the model having tensors that share memory. The error suggests using save_model to handle shared tensors more appropriately.

Troubleshooting Attempts:

I tried reloading the model and saving it using PyTorch's torch.save() instead of safetensors, which worked for saving but doesn’t resolve the root issue with merge.py.
I tried examining the contents of the merged_model folder (which is created when the command runs) and found only the config and the generation_config jsons.
It seems based on the logs that the lm_head.weights and transformer.wte.embeddings may be the shared tensors causing the problem.
Different models attempted with same / similar issue ran into (llama 3.2 1B)

Request for Help:

Can you provide a walkthrough of how to handle this error within the repository's framework?
Should we modify the merge process to avoid shared tensors, or is there an alternative save method that can handle this correctly?

Any guidance or suggestions to resolve this issue would be greatly appreciated!

Thank you for your time and help!

Loading base model: Qwen/Qwen2.5-0.5B config.json: 100% 681/681 [00:00<00:00, 4.94MB/s] model.safetensors: 100% 988M/988M [00:06<00:00, 146MB/s] generation_config.json: 100% 138/138 [00:00<00:00, 1.02MB/s] Loading models to merge: Loading models: 50% 1/2 [00:01<00:01, 1.27s/it] config.json: 100% 729/729 [00:00<00:00, 4.58MB/s] model.safetensors: 0% 0.00/988M [00:00<?, ?B/s] model.safetensors: 1% 10.5M/988M [00:00<00:12, 78.9MB/s] model.safetensors: 3% 31.5M/988M [00:00<00:07, 128MB/s] model.safetensors: 6% 62.9M/988M [00:00<00:05, 181MB/s] model.safetensors: 8% 83.9M/988M [00:00<00:07, 117MB/s] model.safetensors: 13% 126M/988M [00:00<00:05, 168MB/s] model.safetensors: 15% 147M/988M [00:00<00:05, 168MB/s] model.safetensors: 18% 178M/988M [00:01<00:04, 188MB/s] model.safetensors: 21% 210M/988M [00:01<00:03, 211MB/s] model.safetensors: 24% 241M/988M [00:01<00:03, 226MB/s] model.safetensors: 28% 273M/988M [00:01<00:03, 235MB/s] model.safetensors: 31% 304M/988M [00:01<00:02, 238MB/s] model.safetensors: 34% 336M/988M [00:01<00:02, 241MB/s] model.safetensors: 37% 367M/988M [00:01<00:02, 243MB/s] model.safetensors: 40% 398M/988M [00:01<00:02, 243MB/s] model.safetensors: 44% 430M/988M [00:02<00:02, 242MB/s] model.safetensors: 47% 461M/988M [00:02<00:02, 239MB/s] model.safetensors: 50% 493M/988M [00:02<00:02, 243MB/s] model.safetensors: 53% 524M/988M [00:02<00:01, 244MB/s] model.safetensors: 56% 556M/988M [00:02<00:01, 233MB/s] model.safetensors: 59% 587M/988M [00:02<00:01, 247MB/s] model.safetensors: 63% 619M/988M [00:02<00:01, 240MB/s] model.safetensors: 66% 650M/988M [00:02<00:01, 240MB/s] model.safetensors: 69% 682M/988M [00:03<00:01, 250MB/s] model.safetensors: 72% 713M/988M [00:03<00:01, 247MB/s] model.safetensors: 75% 744M/988M [00:03<00:01, 240MB/s] model.safetensors: 79% 776M/988M [00:03<00:00, 246MB/s] model.safetensors: 82% 807M/988M [00:03<00:00, 243MB/s] model.safetensors: 85% 839M/988M [00:03<00:00, 248MB/s] model.safetensors: 88% 870M/988M [00:03<00:00, 249MB/s] model.safetensors: 91% 902M/988M [00:04<00:00, 245MB/s] model.safetensors: 94% 933M/988M [00:04<00:00, 245MB/s] model.safetensors: 100% 988M/988M [00:04<00:00, 226MB/s] generation_config.json: 100% 117/117 [00:00<00:00, 710kB/s] Loading models: 100% 2/2 [00:07<00:00, 3.81s/it] tokenizer_config.json: 100% 7.23k/7.23k [00:00<00:00, 38.7MB/s] vocab.json: 100% 2.78M/2.78M [00:00<00:00, 10.6MB/s] merges.txt: 100% 1.67M/1.67M [00:00<00:00, 23.2MB/s] tokenizer.json: 100% 7.03M/7.03M [00:00<00:00, 19.3MB/s] Processing layer norms: 0it [00:00, ?it/s] Processing embedding layers: 100% 2/2 [00:00<00:00, 19784.45it/s] Processing linear layers: 100% 169/169 [00:01<00:00, 121.07it/s] Total number of parameters: 1260786192 Total number of trainable parameters: 537360 Saving merged model to /content/merged_model Traceback (most recent call last): File "/content/DAM/dam/merge.py", line 267, in <module> main() File "/content/DAM/dam/merge.py", line 252, in main merge_models(args.base_model_id, File "/content/DAM/dam/merge.py", line 215, in merge_models merged_model.save_pretrained(output_path) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"}) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten raise RuntimeError( RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weights.0', 'model.embed_tokens.embeddings.0'}, {'model.embed_tokens.embeddings.1', 'lm_head.weights.1'}]. A potential way to correctly save your model is to use `save_model`. More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

arcee-ai / DAM

RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41