akbaramed commented 3 years ago

Hello, first of all very nice and easy to use package. For some time on google colab i am trying to use the mixture functionality with multiple algorithms but it always picks up "random" . some code which i have used

jupyter cell1

from bbopt.backends.mixture import MixtureBackend from bbopt import BlackBoxOptimizer

jupyter cell 2

bb = BlackBoxOptimizer("file", tag='index_'+str(comp_idx))

jupyter cell 3

bb.run_backend("mixture", [("random", 1), ('gaussian_process',1), ('random_forest',1), ('gradient_boosted_regression_trees',1)]) OR bb.run_backend("mixture", [('gradient_boosted_regression_trees',1), ("random", 1), ('gaussian_process',1), ('random_forest',1) ]) ......... Model code ..... if isinstance(bb.backend, MixtureBackend): bb.remember({ "alg": bb.backend.selected_alg, "l1_reg": l1_reg, ...... }) bb.maximize(vald_f1)

jupyter cell 4

loop = 70 opt_verbose = 0 for i in range(loop): try: run_me_RNN(opt_verbose) print("Summary of run {}/{}:".format(i+1, loop)) pprint(bb.get_current_run()) print() except Exception as e: print('Something went wrong...', e) print("Summary of run {}/{}:".format(i+1, loop)) pprint(bb.get_current_run()) print()

print("\nSummary of best run:") pprint(bb.get_optimal_run()) ####################################

I have tired a couple of times running the code but it always picks up "random" algorithm. Do i need more iterations or is there something missing at my end. On the other hand

bb.run(alg='gaussian_process') #gradient_boosted_regression_trees

runs perfectly fine ##################### On a side note , I am doing multi-label classification(unbalanced dataset), and taking validation-f1 score in maximize optimization. Just want to make sure if its the right error metrics that I am optimizing. Some other parameters I have are training & validation loss , training f1-score.

Any help on the above issues will be greatly appreciated.

Many thanks

evhub commented 3 years ago

@akbaramed You need to call bb.run and bb.maximize in every loop, otherwise BBopt won't know when you've moved on to the next iteration. You'll also need to call bb.remember in every loop if you want to see what alg is being used.

akbaramed commented 3 years ago

Hello, thank you for the message. The code which I have shared is a part of the function 'run_me_RNN' which gets called in loop.

def run_me_RNN(opt_verbose): bb.run_backend("mixture", [("random", 1), ('gaussian_process',1), ('random_forest',1), ('gradient_boosted_regression_trees',1)]) OR bb.run_backend("mixture", [('gradient_boosted_regression_trees',1), ("random", 1), ('gaussian_process',1), ('random_forest',1) ]) ......... Model code ..... if isinstance(bb.backend, MixtureBackend): bb.remember({ "alg": bb.backend.selected_alg, "l1_reg": l1_reg, ...... }) bb.maximize(vald_f1) ########### loop = 70 opt_verbose = 0 for i in range(loop): try: run_me_RNN(opt_verbose) print("Summary of run {}/{}:".format(i+1, loop)) pprint(bb.get_current_run()) print() except Exception as e: print('Something went wrong...', e) print("Summary of run {}/{}:".format(i+1, loop)) pprint(bb.get_current_run()) print()

print("\nSummary of best run:") pprint(bb.get_optimal_run()) ######## Always picK's up random algo.

Thanks for looking into it.

evhub commented 3 years ago

@akbaramed Hmmm... in that case, I'm not sure what's happening. Are you getting any error messages? It could be that it's having trouble writing your data—or maybe you're resetting random.seed in every loop somehow?

akbaramed commented 3 years ago

Makes sense. I commented random.seed code in the run_me_RNN function, let me put it back and then try the code with mixture algorithms. Hopefully should work. Thanks for the suggestion.

evhub commented 3 years ago

@akbaramed Yes, if you were calling random.seed in every loop that would cause the error you are experiencing.

akbaramed commented 3 years ago

Hello, looking at your comment above , i think, it wasnt clearly communicated by me. I am not using random.seed. But I also tried using the code with random.seed but nothing changes, still get only one algorithm from list of provided algorithms. I have tired multiple loops of 20, 30 50 and 70, still only one algorithm is selected from the list bb.run_backend("mixture", [('gradient_boosted_regression_trees',1), ("random", 1), ('gaussian_process',1), ('random_forest',1) ])

Could it be the number of hyper parameters i am using requires more loops for the algorithm to change' Listing down all the hyperparameters

def run_me_RNN(df_ac, num_class,fault_col,opt_verbose ):

bb.run(alg='gaussian_process')

bb.run_backend("mixture", [('gradient_boosted_regression_trees',1), ("random", 1), ('gaussian_process',1), ('random_forest',1) ]) seq_len_size = bb.choice('seq_len_size', [14, 21, 35] ) split_size = bb.choice('split_size', [0.2, 0.25, 0.3, 0.35]) drop_size = bb.choice('drop_size', [ 0.2, 0.3, 0.4, 0.50]) # 0.10, drop_size_d = bb.choice('drop_size_d', [ 0.2, 0.3, 0.4, 0.50])#0.10, drop_size_r = bb.choice('drop_size_r', [ 0.2, 0.3, 0.4, 0.50]) #0.10, rec_drop_size = bb.choice('rec_drop_size', [0.20, 0.25, 0.3, 0.35, 0.4, 0.50])

hidden_layer_1 = bb.choice('hidden_layer_1', [32,64,128]) hidden_layer_2 = bb.choice('hidden_layer_2', [32,64,128]) hidden_layer_d = bb.choice('hidden_layer_d', [32,64,128,256]) batch_size = bb.choice('batch_size',[32,64,128,256]) l1_reg = bb.uniform('l1_reg', 0.001, 0.02) l2_reg = bb.uniform('l2_reg', 0.001, 0.02)

optimizer_lr = bb.uniform('optimizer_lr', 0.001, 0.02) opt_opti = bb.choice('opt_opti', ['rmsprop','adam']) activation_opt_bi = bb.choice('activation_opt_bi', ['relu', 'swish', 'tanh']) activation_opt_d = bb.choice('activation_opt_d', ['relu', 'swish', 'tanh'])#['swish'] fl_gamma_c = bb.choice('fl_gamma_c', [ 0.5, 1., 2., 5.]) # 0.25 fl_alpha_c = bb.choice('fl_alpha_c', [1.5, 2., 2.5, 3., 4.]) embed_opt = bb.choice('embed_opt', [0.7, 0.75, 0.8, 0.85, 0.9, 0.95])

MODEL CODE

if isinstance(bb.backend, MixtureBackend): bb.remember({ "alg": bb.backend.selected_alg, "l1_reg": l1_reg, "l2_reg": l2_reg, "optimizer_lr": optimizer_lr, "opt_opti": opt_opti,
"seq_len_size": seq_len_size, "split_size": split_size, "training loss": var_loss, "training precision": var_precision, "training recall": var_recall, "training accuracy": var_acc, "training auc": var_auc, "training f1-score": var_f1, "best epoch": best_epoch, "validation loss": vald_loss, "validation precision": vald_precision, "validation recall": vald_recall, "validation accuracy": vald_acc, "validation auc": vald_auc, "validation f1-score": vald_f1, }) bb.maximize(vald_f1) ####### END FUNCTION

loop = 30 opt_verbose = 0 for i in range(loop): try: run_me_RNN(df_ac, num_class, fault_col,opt_verbose) print("Summary of run {}/{}:".format(i+1, loop)) pprint(bb.get_current_run()) print() except Exception as e: print('Something went wrong...', e) print("Summary of run {}/{}:".format(i+1, loop)) pprint(bb.get_current_run()) print()

print("\nSummary of best run:") pprint(bb.get_optimal_run())

Thanks for looking into it

evhub commented 3 years ago

@akbaramed Number of parameters should not be an issue, but I can't replicate this issue—the mixture backend is working fine for me. Are you sure you're not implicitly calling random.seed somewhere? Also, have you ensured you're using the latest version of bbopt with pip install -U bbopt?

akbaramed commented 3 years ago

hello,

i am using google colab notebook message below from saying what version of bbopt i have installed. Successfully installed bbopt-1.1.14 hyperopt-0.2.5 portalocker-2.0.0 py4j-0.10.9 pyaml-20.4.0 pyspark-3.0.1 scikit-optimize-0.8.1

i am very sure that there is no where i am using any kind of random.seed.

thanks for looking into the problem. what more information can i provide for you to troubleshoot

thanks

evhub / bbopt

run_backend with Mixture option #10

jupyter cell1

jupyter cell 2

jupyter cell 3

jupyter cell 4

bb.run(alg='gaussian_process') #gradient_boosted_regression_trees

bb.run(alg='gaussian_process')

MODEL CODE