I perform hp search for a deep learning model and there are situations when model diverges and starts producing nans in it's output. My code exits gracefully returning tuple of nans as metrics. This is an expected behavior. But optuna sweeper doesn't think so.
[ ] I created a minimal repro (See this for tips).
To reproduce
Minimal Code/Config snippet to reproduce
Stack trace/error message
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/hydra_plugins/hydra_optuna_sweeper/_impl.py", line 237, in sweep
study.tell(trial=trial, state=state, values=values)
File "/opt/conda/lib/python3.8/site-packages/optuna/study/study.py", line 652, in tell
raise ValueError(values_conversion_failure_message)
ValueError: Trial 0 failed, because the objective function returned nan.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 386, in <lambda>
lambda: hydra.multirun(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 140, in multirun
ret = sweeper.sweep(arguments=task_overrides)
File "/opt/conda/lib/python3.8/site-packages/hydra_plugins/hydra_optuna_sweeper/optuna_sweeper.py", line 42, in sweep
return self.sweeper.sweep(arguments)
File "/opt/conda/lib/python3.8/site-packages/hydra_plugins/hydra_optuna_sweeper/_impl.py", line 240, in sweep
study.tell(trial=trial, state=state, values=values)
File "/opt/conda/lib/python3.8/site-packages/optuna/study/study.py", line 592, in tell
raise ValueError(
ValueError: Values were told. Values cannot be specified when state is TrialState.PRUNED or TrialState.FAIL.
Expected Behavior
If a code returns NaN, then mark trial as failed and proceed without crashing.
π Bug
Description
I perform hp search for a deep learning model and there are situations when model diverges and starts producing nans in it's output. My code exits gracefully returning tuple of nans as metrics. This is an expected behavior. But optuna sweeper doesn't think so.
study.tell
raises an error here: https://github.com/optuna/optuna/blob/release-v2.10.0/optuna/study/study.py#L652 because of the check preformed here https://github.com/optuna/optuna/blob/release-v2.10.0/optuna/study/_optimize.py#L319Checklist
To reproduce
Minimal Code/Config snippet to reproduce
Stack trace/error message
Expected Behavior
If a code returns NaN, then mark trial as failed and proceed without crashing.
System information