Open haomengt opened 1 week ago
Hello, I'm sorry that I haven't encountered this error. Maybe this link can help you: ValueError: The generation config instance is invalid.
"This error appears to be a problem that occurred while upgrading transformers version. I fixed this problem by manually adding do_sample: true in vicuna's generation_config.json file."
作者您好,我想请问一下,为什么我按照您的那个训练的指令执行的,训练总是跑到checkpont-2400那里就有问题呢,报错如下图所示, {'loss': 9.7984, 'grad_norm': 13.973094940185547, 'learning_rate': 0.000741541788969566, 'epoch': 0.03} 1%| | 2400/215760 [2:58:49<265:49:35, 4.49s/it]output_dir /apps/data/models/urbangpt/UrbanGPT/checkpoints/UrbanGPT/checkpoint-2400 up /apps/data/models/urbangpt/UrbanGPT/checkpoints/UrbanGPT/st_projector checkpoint-2400 Traceback (most recent call last): File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/generation/configuration_utils.py", line 771, in save_pretrained raise ValueError(str([w.message for w in caught_warnings])) ValueError: [UserWarning('
do_sample
is set toFalse
. However,temperature
is set to0.9
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
.'), UserWarning('do_sample
is set toFalse
. However,top_p
is set to0.6
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
.')]During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/apps/data/models/urbangpt/UrbanGPT/urbangpt/train/train_mem.py", line 30, in
train()
File "/apps/data/models/urbangpt/UrbanGPT/urbangpt/train/train_st.py", line 822, in train
trainer.train()
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
return inner_training_loop(
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 2356, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 2807, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 2886, in _save_checkpoint
self.save_model(output_dir, _internal_call=True)
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 3454, in save_model
self._save(output_dir)
File "/apps/data/models/urbangpt/UrbanGPT/urbangpt/train/stchat_trainer.py", line 56, in _save
super(STChatTrainer, self)._save(output_dir, state_dict)
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 3525, in _save
self.model.save_pretrained(
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2593, in save_pretrained
model_to_save.generation_config.save_pretrained(save_directory)
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/generation/configuration_utils.py", line 773, in save_pretrained
raise ValueError(
ValueError: The generation config instance is invalid --
.validate()
throws warnings and/or exceptions. Fix these issues to save the configuration.Thrown during validation: UserWarning('
do_sample
is set toFalse
. However,temperature
is set to0.9
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
.'), UserWarning('do_sample
is set toFalse
. However,top_p
is set to0.6
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
.'): Traceback (most recent call last): rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/generation/configuration_utils.py", line 771, in save_pretrained rank0: raise ValueError(str([w.message for w in caught_warnings])) rank0: ValueError: [UserWarning('do_sample
is set toFalse
. However,temperature
is set to0.9
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
.'), UserWarning('do_sample
is set toFalse
. However,top_p
is set to0.6
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
.')]rank0: During handling of the above exception, another exception occurred:
rank0: Traceback (most recent call last): rank0: File "/apps/data/models/urbangpt/UrbanGPT/urbangpt/train/train_mem.py", line 30, in
rank0: File "/apps/data/models/urbangpt/UrbanGPT/urbangpt/train/train_st.py", line 822, in train
rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train rank0: return inner_training_loop( rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 2356, in _inner_training_loop rank0: self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 2807, in _maybe_log_save_evaluate rank0: self._save_checkpoint(model, trial, metrics=metrics) rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 2886, in _save_checkpoint rank0: self.save_model(output_dir, _internal_call=True) rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 3454, in save_model
rank0: File "/apps/data/models/urbangpt/UrbanGPT/urbangpt/train/stchat_trainer.py", line 56, in _save rank0: super(STChatTrainer, self)._save(output_dir, state_dict) rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/trainer.py", line 3525, in _save
rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2593, in save_pretrained
rank0: File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/transformers/generation/configuration_utils.py", line 773, in save_pretrained rank0: raise ValueError( rank0: ValueError: The generation config instance is invalid --
.validate()
throws warnings and/or exceptions. Fix these issues to save the configuration.rank0: Thrown during validation: rank0: [UserWarning('
main()
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
do_sample
is set toFalse
. However,temperature
is set to0.9
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
.'), UserWarning('do_sample
is set toFalse
. However,top_p
is set to0.6
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
.')] wandb: wandb: You can sync this run to the cloud by running: wandb: wandb sync /apps/data/models/urbangpt/UrbanGPT/wandb/offline-run-20240918_231037-16ynzesq wandb: Find logs at: wandb/offline-run-20240918_231037-16ynzesq/logs W0919 02:09:32.451000 140088041554048 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 492236 closing signal SIGTERM W0919 02:09:32.452000 140088041554048 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 492237 closing signal SIGTERM W0919 02:09:32.452000 140088041554048 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 492238 closing signal SIGTERM W0919 02:09:32.452000 140088041554048 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 492239 closing signal SIGTERM W0919 02:09:32.453000 140088041554048 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 492240 closing signal SIGTERM E0919 02:09:33.696000 140088041554048 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 492235) of binary: /apps/data/conda/envs/urbanGPT/bin/python Traceback (most recent call last): File "/apps/data/conda/envs/urbanGPT/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/apps/data/conda/envs/urbanGPT/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/apps/data/conda/envs/urbanGPT/lib/python3.10/site-packages/torch/distributed/run.py", line 905, in