Closed ahmedramly closed 2 years ago
Hi, Thanks for your questions.
What this error means is Mango tried to train the CNN models 7 times, but it succeeded only 6 times. By default, Mango uses 2 random trainings and 5 iterations from your config["num_iteration"] parameter. This training failure can happen due to any number of reasons. You can put try-catch in python to see if you are getting any exception during CNN training.
The solution to this problem is very simple. The objective function signature can be modified to return only successful trainings/values. You can look at an example below where we introduced random failures 30% of the time, and return only successful parameters and their respective values.
https://github.com/ARM-software/mango/blob/master/examples/Failure_Handling.ipynb
Basically, the objective function returns two parameter lists now. In the above example they are:
hyper_evaluated, objective_evaluated
In the case where you are not able to train a particular CNN. You can add some high loss-function for those failures.
Thanks for your 'very' quick and informative response, I really appreciate it.
I added a try/except
to the function and it worked very well.
def objective(hyperparams):
trainset = EEGDataset(...)
testset = EEGDataset(...)
clear_output()
train_loader = DataLoader(trainset, batch_size=32, shuffle=True)
test_loader = DataLoader(testset, batch_size=32)
hyper_eval = []
obj_eval = []
for param in hyperparams:
try:
model = BaselineModel(param)
trainer = pl.Trainer(max_epochs=1, accelerator='auto')
trainer.fit(model, train_loader, test_loader)
value = trainer.callback_metrics['train_loss'].item()
obj_eval.append(value)
hyper_eval.append(param)
except:
print('Failed Evaluation')
continue
return hyper_eval, obj_eval
If you find something wrong with this implementation, please let me know. Thanks a lot
Your implementation seems fine to me. If you face any more issues, please free to reopen this issue or create a new one.
Thanks a lot, for sure.
Hi,
I am trying to use
mango
for the first time and I was trying to fine tune the hyperparameters of a simple CNN model. I chose to start with those simple hyperparameters:And I defined the objective function as following:
And then started the optimization as following:
I tried 5 trials to check that everything is working before i start the real optimization. It worked at the begining them it gave me this error
I don’t understand where this error comes from, It would be very nice if you can help me with this issue.