[[48, 0, 0], 1200.0, {"submitted": 1592881287.3019679, "started": 1592881287.3047717, "finished": 1592882478.390403}, {"loss": 0.4213403548487583, "info": {"loss": 0.2595444439162671, "model_parameters": 496799.0, "train_metric_ql": 0.25954445002785426, "lr_scheduler_converged": 0.0, "lr": 0.004473769175123476, "val_metric_ql": 0.4213403548487583}}, null]
[[48, 0, 1], 1200.0, {"submitted": 1592882478.4058812, "started": 1592882478.4085217, "finished": 1592882508.2846193}, null, "Traceback (most recent call last):\n File \"/home/ubuntu/anaconda3/envs/autopytorch/lib/python3.6/site-packages/hpbandster/core/worker.py\", line 206, in start_computation\n result = {'result': self.compute(*args, config_id=id, **kwargs),\n File \"/home/ubuntu/anaconda3/envs/autopytorch/lib/python3.6/site-packages/autoPyTorch-0.0.2-py3.6.egg/autoPyTorch/core/worker.py\", line 87, in compute\n raise Exception(\"Exception in train pipeline. Took \" + str((time.time()-start_time)) + \" seconds with budget \" + str(budget))\nException: Exception in train pipeline. Took 29.862977743148804 seconds with budget 1200.0\n"]
[[48, 0, 2], 1200.0, {"submitted": 1592882508.302436, "started": 1592882508.3056898, "finished": 1592882515.6628084}, null, "Traceback (most recent call last):\n File \"/home/ubuntu/anaconda3/envs/autopytorch/lib/python3.6/site-packages/hpbandster/core/worker.py\", line 206, in start_computation\n result = {'result': self.compute(*args, config_id=id, **kwargs),\n File \"/home/ubuntu/anaconda3/envs/autopytorch/lib/python3.6/site-packages/autoPyTorch-0.0.2-py3.6.egg/autoPyTorch/core/worker.py\", line 87, in compute\n raise Exception(\"Exception in train pipeline. Took \" + str((time.time()-start_time)) + \" seconds with budget \" + str(budget))\nException: Exception in train pipeline. Took 7.344435214996338 seconds with budget 1200.0\n"]
If I infer what's wrong from the position of exception and null, I think loss was null so there might have been problem calculating "info": {"loss": , "model_parameters": , "train_metric_ql": , "lr_scheduler_converged": , "val_metric_ql": }}. If this is indeed the only problem(something like loss goes to nan), I think it would be better to log what happened instead of logging several exceptions...
Here is part of my log file
results.json
.If I infer what's wrong from the position of exception and
null
, I think loss wasnull
so there might have been problem calculating"info": {"loss": , "model_parameters": , "train_metric_ql": , "lr_scheduler_converged": , "val_metric_ql": }}
. If this is indeed the only problem(something like loss goes tonan
), I think it would be better to log what happened instead of logging several exceptions...