autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 270 forks source link

Gracefully dealing with parameter combinations that cause errors during training #435

Closed thawn closed 1 year ago

thawn commented 4 years ago

1) It would be nice, if Talos could add

A way to handle Errors that appear only for certain parameter combinations. Sometimes, certain hyperparameter combinations might lead to an error that cannot be avoided (such as ResourceExhaustedError)

For example, in my model functionthat I pass to talos.Scan I would have the following code:

try:
    history = model.fit(...)
except ResourceExhaustedError:
    history = None

alternatively (and imho even more nice) would be a way to catch errors during training from within talos

2) Once implemented, I can see how this feature will

make it possible to scan parameter space that includes some combinations that lead to errors

3) I believe this feature is

4) Given the chance, I'd be happy to make a PR for this feature

I would need pointers for how best to implement this:


github-actions[bot] commented 4 years ago

Welcome to Talos community! Thanks so much for creating your first issue :)