arcee-ai / DAM

30 stars 4 forks source link

RuntimeError During Merging Process Possibly Due to Shared Memory Tensors #41

Open SolshineCode opened 3 hours ago

SolshineCode commented 3 hours ago

Description: I'm encountering an error while trying to merge models using the merge.py script. The process loads the models and processes the layers correctly, but when it attempts to save the merged model, a RuntimeError is raised due to tensors sharing memory. Here's the detailed log:

This issue regarding when running following command on notebook

!python dam/merge.py \
  "cerebras/Cerebras-GPT-111M" \
  "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
  --output_path "/content/merged_model" \
  --device "cuda" \
  --repo_id "Solshine/Cerebras-GPT-111M-DAM-test-untrained-merge"

Log Output:

Loading base model: cerebras/Cerebras-GPT-111M
Loading models to merge:
Loading models: 100% 2/2 [00:01<00:00,  1.25it/s]
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Processing layer norms: 0it [00:00, ?it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 4029.11it/s]
Processing linear layers: 100% 1/1 [00:00<00:00,  3.68it/s]
Total number of parameters: 226845696
Total number of trainable parameters: 72454656
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.wte.embeddings.0', 'lm_head.weights.0'}, {'transformer.wte.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

Reproduction Steps:

  1. Run merge.py script with the following parameters:
    !python dam/merge.py \
      "cerebras/Cerebras-GPT-111M" \
      "cerebras/Cerebras-GPT-111M" "Corianas/111m" \
      --output_path "/content/merged_model" \
      --device "cuda" \
      --repo_id "Solshine/DAM-test-untrained-merge"
  2. The error occurs during the save_pretrained() call when the merged model is being saved.

Expected Behavior: The merged model should save correctly without errors.

Actual Behavior: The process fails during the save step due to the model having tensors that share memory. The error suggests using save_model to handle shared tensors more appropriately.

Troubleshooting Attempts:

Request for Help:

Any guidance or suggestions to resolve this issue would be greatly appreciated!

Thank you for your time and help!

SolshineCode commented 2 hours ago

This also happens with Qwen/Qwen2.5-0.5B

Loading base model: Qwen/Qwen2.5-0.5B
config.json: 100% 681/681 [00:00<00:00, 4.94MB/s]
model.safetensors: 100% 988M/988M [00:06<00:00, 146MB/s]
generation_config.json: 100% 138/138 [00:00<00:00, 1.02MB/s]
Loading models to merge:
Loading models:  50% 1/2 [00:01<00:01,  1.27s/it]
config.json: 100% 729/729 [00:00<00:00, 4.58MB/s]

model.safetensors:   0% 0.00/988M [00:00<?, ?B/s]
model.safetensors:   1% 10.5M/988M [00:00<00:12, 78.9MB/s]
model.safetensors:   3% 31.5M/988M [00:00<00:07, 128MB/s] 
model.safetensors:   6% 62.9M/988M [00:00<00:05, 181MB/s]
model.safetensors:   8% 83.9M/988M [00:00<00:07, 117MB/s]
model.safetensors:  13% 126M/988M [00:00<00:05, 168MB/s] 
model.safetensors:  15% 147M/988M [00:00<00:05, 168MB/s]
model.safetensors:  18% 178M/988M [00:01<00:04, 188MB/s]
model.safetensors:  21% 210M/988M [00:01<00:03, 211MB/s]
model.safetensors:  24% 241M/988M [00:01<00:03, 226MB/s]
model.safetensors:  28% 273M/988M [00:01<00:03, 235MB/s]
model.safetensors:  31% 304M/988M [00:01<00:02, 238MB/s]
model.safetensors:  34% 336M/988M [00:01<00:02, 241MB/s]
model.safetensors:  37% 367M/988M [00:01<00:02, 243MB/s]
model.safetensors:  40% 398M/988M [00:01<00:02, 243MB/s]
model.safetensors:  44% 430M/988M [00:02<00:02, 242MB/s]
model.safetensors:  47% 461M/988M [00:02<00:02, 239MB/s]
model.safetensors:  50% 493M/988M [00:02<00:02, 243MB/s]
model.safetensors:  53% 524M/988M [00:02<00:01, 244MB/s]
model.safetensors:  56% 556M/988M [00:02<00:01, 233MB/s]
model.safetensors:  59% 587M/988M [00:02<00:01, 247MB/s]
model.safetensors:  63% 619M/988M [00:02<00:01, 240MB/s]
model.safetensors:  66% 650M/988M [00:02<00:01, 240MB/s]
model.safetensors:  69% 682M/988M [00:03<00:01, 250MB/s]
model.safetensors:  72% 713M/988M [00:03<00:01, 247MB/s]
model.safetensors:  75% 744M/988M [00:03<00:01, 240MB/s]
model.safetensors:  79% 776M/988M [00:03<00:00, 246MB/s]
model.safetensors:  82% 807M/988M [00:03<00:00, 243MB/s]
model.safetensors:  85% 839M/988M [00:03<00:00, 248MB/s]
model.safetensors:  88% 870M/988M [00:03<00:00, 249MB/s]
model.safetensors:  91% 902M/988M [00:04<00:00, 245MB/s]
model.safetensors:  94% 933M/988M [00:04<00:00, 245MB/s]
model.safetensors: 100% 988M/988M [00:04<00:00, 226MB/s]

generation_config.json: 100% 117/117 [00:00<00:00, 710kB/s]
Loading models: 100% 2/2 [00:07<00:00,  3.81s/it]
tokenizer_config.json: 100% 7.23k/7.23k [00:00<00:00, 38.7MB/s]
vocab.json: 100% 2.78M/2.78M [00:00<00:00, 10.6MB/s]
merges.txt: 100% 1.67M/1.67M [00:00<00:00, 23.2MB/s]
tokenizer.json: 100% 7.03M/7.03M [00:00<00:00, 19.3MB/s]
Processing layer norms: 0it [00:00, ?it/s]
Processing embedding layers: 100% 2/2 [00:00<00:00, 19784.45it/s]
Processing linear layers: 100% 169/169 [00:01<00:00, 121.07it/s]
Total number of parameters: 1260786192
Total number of trainable parameters: 537360
Saving merged model to /content/merged_model
Traceback (most recent call last):
  File "/content/DAM/dam/merge.py", line 267, in <module>
    main()
  File "/content/DAM/dam/merge.py", line 252, in main
    merge_models(args.base_model_id, 
  File "/content/DAM/dam/merge.py", line 215, in merge_models
    merged_model.save_pretrained(output_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2793, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 286, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
  File "/usr/local/lib/python3.10/dist-packages/safetensors/torch.py", line 488, in _flatten
    raise RuntimeError(
RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'lm_head.weights.0', 'model.embed_tokens.embeddings.0'}, {'model.embed_tokens.embeddings.1', 'lm_head.weights.1'}].
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors