Closed s-rog closed 4 years ago
@SkafteNicki
I also face a similar issue with Tensorboard logger whenever the logger flag is left as default both on GPU and TPU colab runtime. It throws the following exception on TPU runtime
Exception in device=TPU:0: dictionary update sequence element #0 has length 1; 2 is required
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 119, in _start_fn
fn(gindex, *args)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/distrib_parts.py", line 531, in tpu_train
self.run_pretrain_routine(model)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 980, in run_pretrain_routine
self.train()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py", line 347, in train
self.run_training_epoch()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py", line 465, in run_training_epoch
self.log_metrics(batch_step_metrics, grad_norm_dic)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/logging.py", line 74, in log_metrics
self.logger.save()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/distributed.py", line 10, in wrapped_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/loggers/tensorboard.py", line 161, in save
save_hparams_to_yaml(hparams_file, self.hparams)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/saving.py", line 151, in save_hparams_to_yaml
yaml.dump(hparams, fp)
File "/usr/local/lib/python3.6/dist-packages/yaml/__init__.py", line 200, in dump
return dump_all([data], stream, Dumper=Dumper, **kwds)
File "/usr/local/lib/python3.6/dist-packages/yaml/__init__.py", line 188, in dump_all
dumper.represent(data)
File "/usr/local/lib/python3.6/dist-packages/yaml/representer.py", line 26, in represent
node = self.represent_data(data)
File "/usr/local/lib/python3.6/dist-packages/yaml/representer.py", line 47, in represent_data
node = self.yaml_representers[data_types[0]](self, data)
File "/usr/local/lib/python3.6/dist-packages/yaml/representer.py", line 205, in represent_dict
return self.represent_mapping('tag:yaml.org,2002:map', data)
File "/usr/local/lib/python3.6/dist-packages/yaml/representer.py", line 116, in represent_mapping
node_value = self.represent_data(item_value)
File "/usr/local/lib/python3.6/dist-packages/yaml/representer.py", line 51, in represent_data
node = self.yaml_multi_representers[data_type](self, data)
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Similarly, on GPU runtime it throws an exception saying can't pickle _thread.lock objects
.
I resolve the issue by setting logger=False
🐛 Bug
DDP breaks LR finder
To Reproduce
At first I thought it's because
configure_optimizers
returns[opt], [sched]
but returningopt
still causes the error. Training works correctly with the same code.