Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.01k stars 3.36k forks source link

Logger emits exception when there's `None` in hparams #984

Closed kyoungrok0517 closed 4 years ago

kyoungrok0517 commented 4 years ago

To Reproduce

My hparams:

{
    'n': [8000],
    'k': [30],
    'batch_size': 512,
    'data_dir': '/Users/kyoungrok/Resilio Sync/Dataset/2019 TREC/passage_ranking/dataset',
    'max_nb_epochs': 500,
    'learning_rate': 0.0001,
    'nodes': 1,
    'distributed_backend': None,
    'eval_test_set': False,
    'check_val_every_n_epoch': 1,
    'accumulate_grad_batches': 1,
    'max_epochs': 200,
    'min_epochs': 2,
    'train_percent_check': 1.0,
    'val_percent_check': 1.0,
    'test_percent_check': 1.0,
    'val_check_interval': 0.95,
    'log_save_interval': 100,
    'row_log_interval': 100,
    'enable_early_stop': True,
    'early_stop_metric': 'val_acc',
    'early_stop_mode': 'min',
    'early_stop_patience': 3,
    'gradient_clip_val': -1,
    'track_grad_norm': -1,
    'model_save_path': '/Users/kyoungrok/Desktop/trec-2019-deep-learning/trec2019/sparse/sparsenet/model_weights',
    'model_save_monitor_value': 'val_acc',
    'model_save_monitor_mode': 'max',
    'model_load_weights_path': None,
    'tt_name': 'pt_test',
    'tt_description': 'pytorch lightning test',
    'tt_save_path': '/Users/kyoungrok/Desktop/trec-2019-deep-learning/trec2019/sparse/sparsenet/test_tube_logs',
    'single_run': False,
    'nb_hopt_trials': 1,
    'log_stdout': False,
    'gpus': None,
    'single_run_gpu': False,
    'default_tensor_type': 'torch.cuda.FloatTensor',
    'use_amp': False,
    'check_grad_nans': False,
    'amp_level': 'O2',
    'on_cluster': False,
    'fast_dev_run': True,
    'enable_tqdm': False,
    'overfit': -1,
    'interactive': False,
    'debug': False,
    'local': False,
    'lr_scheduler_milestones': None,
    'k_inference_factor': 1.5,
    'weight_sparsity': [0.3],
    'boost_strength': 1.5,
    'boost_strength_factor': 0.85,
    'dropout': 0.0,
    'use_batch_norm': True,
    'normalize_weights': False,
    'hpc_exp_number': None
}

I see the following errors because I have None value in my hparams dict


Traceback (most recent call last):
  File "sparsenet_trainer.py", line 73, in <module>
    main(hparam_trial)
  File "sparsenet_trainer.py", line 47, in main
    trainer.fit(model)
  File "/Users/kyoungrok/anaconda3/envs/trec/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 707, in fit
    self.run_pretrain_routine(model)
  File "/Users/kyoungrok/anaconda3/envs/trec/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 757, in run_pretrain_routine
    self.logger.log_hyperparams(ref_model.hparams)
  File "/Users/kyoungrok/anaconda3/envs/trec/lib/python3.7/site-packages/pytorch_lightning/logging/base.py", line 14, in wrapped_fn
    fn(self, *args, **kwargs)
  File "/Users/kyoungrok/anaconda3/envs/trec/lib/python3.7/site-packages/pytorch_lightning/logging/tensorboard.py", line 88, in log_hyperparams
    self.experiment.add_hparams(hparam_dict=params, metric_dict={})
  File "/Users/kyoungrok/anaconda3/envs/trec/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 300, in add_hparams
    exp, ssi, sei = hparams(hparam_dict, metric_dict)
  File "/Users/kyoungrok/anaconda3/envs/trec/lib/python3.7/site-packages/torch/utils/tensorboard/summary.py", line 156, in hparams
    raise ValueError('value should be one of int, float, str, bool, or torch.Tensor')
ValueError: value should be one of int, float, str, bool, or torch.Tensor

Environment

PyTorch version: 1.4.0 Is debug build: No CUDA used to build PyTorch: None

OS: Mac OSX 10.15.3 GCC version: Could not collect CMake version: version 3.16.1

Python version: 3.7 Is CUDA available: No CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA

Versions of relevant libraries: [pip] numpy==1.18.1 [pip] pytorch-lightning==0.6.0 [pip] torch==1.4.0 [pip] torchvision==0.5.0 [conda] blas 1.0 mkl [conda] mkl 2019.4 233 [conda] mkl-service 2.3.0 py37hfbe908c_0 [conda] mkl_fft 1.0.15 py37h5e564d8_0 [conda] mkl_random 1.1.0 py37ha771720_0 [conda] pytorch 1.4.0 py3.7_0 pytorch [conda] pytorch-lightning 0.6.0 pypi_0 pypi [conda] torchvision 0.5.0 py37_cpu pytorch

Borda commented 4 years ago

Hello, thanks for letting us know, could you also share with us the model/trainer so we can replicate your problem... :]

kyoungrok0517 commented 4 years ago

Hi, thanks for the response. Here's my code and data. The error can be reproduced under pytorch-lightning >= 0.5.3.3

https://www.dropbox.com/sh/5dyq5dp5l8zfc3r/AAAiK-IoihpgdnJ3L8QzKAxNa?dl=0

Borda commented 4 years ago

@kyoungrok0517 and what would be your expected behaviour, just ignore all parameters with none value?

kyoungrok0517 commented 4 years ago

I think that will be the best option. I expect that happens automatically inside the lightning library.


보낸 사람: Jirka Borovec notifications@github.com 보낸 날짜: Wednesday, March 4, 2020 3:40:59 AM 받는 사람: PyTorchLightning/pytorch-lightning pytorch-lightning@noreply.github.com 참조: Kyoungrok Jang kyoungrok.jang@gmail.com; Mention mention@noreply.github.com 제목: Re: [PyTorchLightning/pytorch-lightning] Logger emits exception when there's None in hparams (#984)

@kyoungrok0517https://github.com/kyoungrok0517 and what would be your expected behaviour, just ignore all parameters with none value?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/PyTorchLightning/pytorch-lightning/issues/984?email_source=notifications&email_token=AAIAZ7CG4NMLHAFGWZTZQU3RFVFLXA5CNFSM4K6RL3L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENUU3EY#issuecomment-594103699, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAIAZ7D2RH2S3JX523IFRULRFVFLXANCNFSM4K6RL3LQ.

awaelchli commented 4 years ago

This is fixed on master. See #1130. @Borda it can be closed.