Closed eddiebergman closed 3 months ago
This was fixed with #126. You now immediately get the error and traceback from the worker that evaluated the config that crashed. In this example, I included an error ValueError("something went wrong")
inside the target function.
INFO:neps.api:Starting neps.run using root directory results/hyperparameters_example
INFO:neps.api:Running bayesian_optimization as the searcher
INFO:neps.api:Strategy: bayesian_optimization
INFO:neps.runtime:Launching NePS
INFO:neps.runtime:Worker '176609-2024-08-05T16:36:20.632810+00:00' sampled a new trial: Trial(config={'categorical': 1, 'float1': 0.9083763101496742, 'float2': -7.058519989984767, 'integer1': 1, 'integer2': 19}, metadata=MetaData(id='1', location='results/hyperparameters_example/configs/config_1', previous_trial_id=None, previous_trial_location=None, sampling_worker_id='176609-2024-08-05T16:36:20.632810+00:00', time_sampled=1722875780.6338768, evaluating_worker_id=None, evaluation_duration=None, time_submitted=None, time_started=None, time_end=None), state=<State.PENDING: 'pending'>, report=None)
ERROR:neps.state._eval:Error during evaluation of '1': {'categorical': 1, 'float1': 0.9083763101496742, 'float2': -7.058519989984767, 'integer1': 1, 'integer2': 19}.
ERROR:neps.state._eval:Something went wrong!
Traceback (most recent call last):
File "/home/skantify/code/wandb-neps/vendored/neps/neps/state/_eval.py", line 125, in _eval_trial
user_result = fn(**kwargs, **trial.config)
File "/home/skantify/code/neps/neps_examples/basic_usage/hyperparameters.py", line 11, in run_pipeline
raise ValueError("Something went wrong!")
ValueError: Something went wrong!
INFO:neps.runtime:Worker '176609-2024-08-05T16:36:20.632810+00:00' evaluated trial: 1 as State.CRASHED.
ERROR:neps.runtime:Error during evaluation of '1' : {'categorical': 1, 'float1': 0.9083763101496742, 'float2': -7.058519989984767, 'integer1': 1, 'integer2': 19}.
ERROR:neps.runtime:Something went wrong!
NoneType: None
Traceback (most recent call last):
File "/home/skantify/code/neps/neps_examples/basic_usage/hyperparameters.py", line 25, in <module>
neps.run(
File "/home/skantify/code/wandb-neps/vendored/neps/neps/api.py", line 232, in run
_launch_runtime(
File "/home/skantify/code/wandb-neps/vendored/neps/neps/runtime.py", line 534, in _launch_runtime
worker.run()
File "/home/skantify/code/wandb-neps/vendored/neps/neps/runtime.py", line 356, in run
should_stop = self._check_if_should_stop(
File "/home/skantify/code/wandb-neps/vendored/neps/neps/runtime.py", line 212, in _check_if_should_stop
raise error_from_this_worker
File "/home/skantify/code/wandb-neps/vendored/neps/neps/state/_eval.py", line 125, in _eval_trial
user_result = fn(**kwargs, **trial.config)
File "/home/skantify/code/neps/neps_examples/basic_usage/hyperparameters.py", line 11, in run_pipeline
raise ValueError("Something went wrong!")
ValueError: Something went wrong!
If you have another worker that is set to stop on any error occuring, you will also see the error, for example:
INFO:neps.api:Starting neps.run using root directory results/hyperparameters_example
INFO:neps.api:Running bayesian_optimization as the searcher
INFO:neps.api:Strategy: bayesian_optimization
INFO:neps.runtime:Launching NePS
Traceback (most recent call last):
File "/home/skantify/code/neps/neps_examples/basic_usage/hyperparameters.py", line 25, in <module>
neps.run(
File "/home/skantify/code/wandb-neps/vendored/neps/neps/api.py", line 232, in run
_launch_runtime(
File "/home/skantify/code/wandb-neps/vendored/neps/neps/runtime.py", line 534, in _launch_runtime
worker.run()
File "/home/skantify/code/wandb-neps/vendored/neps/neps/runtime.py", line 356, in run
should_stop = self._check_if_should_stop(
File "/home/skantify/code/wandb-neps/vendored/neps/neps/runtime.py", line 269, in _check_if_should_stop
raise err
neps.state.err_dump.SerializedError: An error occurred during the evaluation of a trial '1' which was evaluted by worker '176609-2024-08-05T16:36:20.632810+00:00'. The original error could not be deserialized but had the following information:
ValueError: Something went wrong!
Traceback (most recent call last):
File "/home/skantify/code/wandb-neps/vendored/neps/neps/state/_eval.py", line 125, in _eval_trial
user_result = fn(**kwargs, **trial.config)
File "/home/skantify/code/neps/neps_examples/basic_usage/hyperparameters.py", line 11, in run_pipeline
raise ValueError("Something went wrong!")
ValueError: Something went wrong!
@Neeratyoy this is what I meant by errors. I will make a small PR to include some information to the user on how to recover from this:
You can see the better output in #128 now
While testing some new things, I get this error and it's got no information useful for understanding what went wrong.
It did show something in the logs which is nice but I feel like these errors should get bubbled all the way up.