Some tensors share memory

PamKing7 commented 4 months ago

When I run the command python experiments/imdb_toxicity_response/run_ppo.py --mode local --gpus 0, I get the error "RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'base_model.lm_head.weight', 'base_model.transformer.wte.weight'}]. A potential way to correctly save your model is to use save_model. More information at https://huggingface.co/docs/safetensors/torch_shared_tensors",how should I modify the save function?

Traceback (most recent call last): File "/mnt/data101_d2/wpy/curiosity_redteam/ppo_gpt2_gpt2_imdb_toxicity_response.py", line 196, in <module> main(hparams) File "/mnt/data101_d2/wpy/curiosity_redteam/ppo_gpt2_gpt2_imdb_toxicity_response.py", line 186, in main trlx.train( File "/mnt/data101_d2/wpy/curiosity_redteam/custom_trlx/trlx/trlx.py", line 129, in train trainer.learn() File "/mnt/data101_d2/wpy/curiosity_redteam/custom_trlx/trlx/trainer/accelerate_base_trainer.py", line 636, in learn self.save(directory) File "/mnt/data101_d2/wpy/curiosity_redteam/custom_trlx/trlx/trainer/accelerate_base_trainer.py", line 314, in save self.accelerator.save_state(dst_dir, **kwargs) File "/home/tangbo/.local/lib/python3.12/site-packages/accelerate/accelerator.py", line 2825, in save_state save_location = save_accelerator_state( ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/tangbo/.local/lib/python3.12/site-packages/accelerate/checkpointing.py", line 99, in save_accelerator_state save(state, output_model_file, save_on_each_node=save_on_each_node, safe_serialization=safe_serialization) File "/home/tangbo/.local/lib/python3.12/site-packages/accelerate/utils/other.py", line 205, in save save_func(obj, f) File "/home/tangbo/.local/lib/python3.12/site-packages/safetensors/torch.py", line 284, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) ^^^^^^^^^^^^^^^^^ File "/home/tangbo/.local/lib/python3.12/site-packages/safetensors/torch.py", line 480, in _flatten raise RuntimeError( RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'base_model.lm_head.weight', 'base_model.transformer.wte.weight'}]. A potential way to correctly save your model is to usesave_model. More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

yechao-zhang commented 4 months ago

I have the same issue. Did you solve it?

PamKing7 commented 4 months ago

not yet

williamd4112 commented 4 months ago

Can you share your library version? Use pip freeze > version.txt

PamKing7 commented 3 months ago

Sorry, my conda environment has been deleted. I was just following the instructions in the README to run the script: conda create -n redteam python=3.10 git clone git@github.com:Improbable-AI/curiosity_redteam.git cd custom_trlx pip install -e . cd .. pip install -r requirements.txt export PYTHONPATH=$(pwd):$PYTHONPATH python experiments/imdb_toxicity_response/run_ppo.py --mode local --gpus 0

Improbable-AI / curiosity_redteam

Some tensors share memory #6