KevinMusgrave / powerful-benchmarker

A library for ML benchmarking. It's powerful.
429 stars 44 forks source link

Problem with hp optimisation #5

Closed AlexFridman closed 4 years ago

AlexFridman commented 4 years ago

Hi! I run hp optimization (on my server) and after 1st trial it's freezing. The only thing I changed - I've added my own dataset class and changed the corresponding configuration parameter. Previously, I used to run run.py w/o any problems.

Double slash in the log bellow looks suspicious bayesian_optimizer_logs//log00000.json

Mu run command python run_bayesian_optimization.py --bayesian_optimization_n_iter 50 --loss_funcs~OVERRIDE~ {met ric_loss: {MultiSimilarityLoss: {alpha~BAYESIAN~: [0.01, 50], beta~BAYESIAN~: [0.01, 50], base~BAYESIAN~: [0, 1]}}} --mining_funcs~OVERRIDE~ {post_g radient_miner: {MultiSimilarityMiner: {epsilon~BAYESIAN~: [0, 1]}}} --experiment_name test5050_multi_similarity_with_ms_miner --root_experiment_fold er experiments_opt --pytorch_home models

Could you please help? Thanks!


INFO:root:embedding dimensionality is 128                                                                                                           
WARNING clustering 45 points to 9 centroids: please provide at least 351 training points                                                            
INFO:root:New best accuracy!                                                                                                                        
INFO:root:SPLIT: Test50_50_Partitions4_3 / train / length 140                                                                                       
INFO:root:TRAINING EPOCH 3                                                                                                                          
total_loss=0.27627: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:19<00:00,  5.08it/s]
INFO:root:TRAINING EPOCH 4                                                                                                                          
total_loss=0.18487: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:19<00:00,  5.10it/s]
INFO:root:COLLECTING DATASETS FOR EVAL                                                                                                              
INFO:root:SPLIT: Test50_50_Partitions4_3 / train / length 140                                                                                       
INFO:root:SPLIT: Test50_50_Partitions4_3 / val / length 45                                                                                          
INFO:root:Evaluating epoch 4                                                                                                                        
INFO:root:Getting embeddings for the train split                                                                                                    
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12.46it/s]
INFO:root:Getting embeddings for the val split                                                                                                      
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.73it/s]
INFO:root:Computing accuracy for the train split                                                                                                    
INFO:root:running k-nn with k=5                                                                                                                     
INFO:root:embedding dimensionality is 128                                                                                                           
WARNING clustering 140 points to 28 centroids: please provide at least 1092 training points                                                         
INFO:root:Computing accuracy for the val split                                                                                                      
INFO:root:running k-nn with k=5                                                                                                                     
INFO:root:embedding dimensionality is 128                                                                                                           
WARNING clustering 45 points to 9 centroids: please provide at least 351 training points                                                            
[INFO 01-25 14:03:10] ax.service.ax_client: Completed trial 0 with data: {'mean_average_r_precision': (0.57, 0.03)}.                                
[INFO 01-25 14:03:10] ax.service.ax_client: Saved JSON-serialized state of optimization to `experiments_opt/bayesian_optimizer_logs//log00000.json`.```
KevinMusgrave commented 4 years ago

Hey I've actually had hanging issues too, and I'm not sure what's causing it because it seems to happen randomly (i.e. not every single time). I'm pretty sure double slashes don't affect anything, but to be safe, I've gone and replaced all my path-forming strings with os.path.join, in this repo, record-keeper, and easy-module-attribute-getter. Hopefully the changes didn't break anything!

If it still hangs, you could try using my script_wrapper.sh. It's a hacky solution but works well for me. Basically it checks the folder in which your experiment should be saving stuff. If there have been no updates to that folder or its subfolders in X minutes, then it kills the process, and starts a new one. (The run_bayesian_optimization.py script always resumes from the latest possible iteration). Here's how you can use it if you're interested:

  1. First, change the "experiment_folder" variable in script_wrapper.sh to your own folder (leave $experiment_name there).
  2. Go to process_checker.sh and change the number of minutes in line 9 to the amount of time that you want to wait before you consider something to be hanging.
  3. Run like this:
    ./script_wrapper.sh <name of your script> <experiment_name>

    In your case, let's say I pasted your bash command into "bayesian_script.sh". Then I would run:

    ./script_wrapper.sh bayesian_script.sh experiments_opt

    This will kill the process if there have been no changes in "/home/blah/experiments_opt" and its subfolders, in X minutes (where you set X in process_checker.sh).

If a lot of hanging occurs, then you'll probably end up with extra experiment folders in experiments_opt. For example, if you run 10 iterations of bayesian optimization, then you might end up with say 13 experiment folders. This shouldn't affect the final "best_parameters" yaml file, because the script keep track of the folders that actually finished without hanging.

The only real downside to using the script_wrapper is that you have to use kill -9 to kill the process, because Ctrl-C will only kill the script_wrapper and not the wrapped script.

Also FYI, with the latest version of easy-module-attribute-getter, you can use the \~OVERRIDE\~ flag within nested dictionaries. For example, if you want to change the optimizer for the trunk model only and not the embedder, you can do:

python run.py \
--experiment_name test2 \
--optimizers {trunk_optimizer~OVERRIDE~: {RMSprop: {lr: 0.01}}} 

(Previously, I was only using \~OVERRIDE\~ at the top level of nested dictionaries, so in the above example you used to have to redefine the embedder optimizer.)

Hope this helps!

AlexFridman commented 4 years ago

Thank you, @KevinMusgrave, for the quick response! I'll try it shortly. And thank you once again for the pytorch-metric-learning package. It's well written and easy to use.

KevinMusgrave commented 4 years ago

@AlexFridman I may have fixed the hanging issue. I think it was caused by the pytorch dataloader processes not being killed properly, so in the latest commit, I'm manually deleting the tester and trainer objects here:

https://github.com/KevinMusgrave/powerful-benchmarker/blob/27ff469c9457906044d306226b077518b0290a0c/run.py#L46

AlexFridman commented 4 years ago

Thank you, @KevinMusgrave! I suspected it was related to multiprocessing staff. It's typical when code just freezes w/o any stack trace.

One more question about the benchmarker: We have a huge dataset and manually define partition scheme in a dataset class (like only 1% for validation) because it fails on the evaluation stage when tries to put the whole dataset in faiss-gpu. As I see there's a call of eval_model method during training w/o specifying splits_to_exclude parameter. Therefore (as I understand) it tries to run evaluation on our huge dataset even if we've specified eval_reference_set: compared_to_self and splits_to_eval=val.

Is it possible to somehow overcome this issue w/o decreasing train size?

KevinMusgrave commented 4 years ago

@AlexFridman Thanks for pointing out this issue. Now in the latest commit, you should be able to do --splits_to_eval val, and during the validation step of training, it will compute embeddings and accuracy only for the val split. (If you don't specify splits_to_eval, then the default is to compute accuracy for all splits, excluding the test set.)

AlexFridman commented 4 years ago

Thank you very much, @KevinMusgrave!

AlexFridman commented 4 years ago

Hello, @KevinMusgrave! It looks like we have the same issue in bayess_opt script. Link. It should be val, I guess, but not sure. Could you please take a look? Thanks!

KevinMusgrave commented 4 years ago

Hmm, the assumption in the bayesian optimization script is that you'll do optimization based on the validation set(s), and then test the best parameters on the test set. (The splits_to_eval variable appears in the function "test_best_model", which is called at the very end of the script.

AlexFridman commented 4 years ago

My colleague told me when she runs BO script, it also runs evaluation on the train part of the data. Maybe there's another reason for it. 1st reason: we don't have test at our splitting... 2nd reason: in run.py splits_to_eval has a default value ['val'], in BO script it does not have a default value for trails runs and that's why it uses all splits to eval.

Should we set splits_to_eval in BO script to ['val'] during HP search?

KevinMusgrave commented 4 years ago

Re: 1st reason, I assume you're setting "special_split_scheme_name" to "predefined", since you're defining the train/val split yourself. If you're not setting that flag, then train/val/test splits will be created as described here: https://github.com/KevinMusgrave/powerful-benchmarker#split-schemes-and-cross-validation

Re: 2nd reason, actually the current default value for splits_to_eval in run.py is None, which means use all non-test splits. The bayes_opt script uses the same default value as run.py, so you're right, you'll have to set splits_to_eval to val.

AlexFridman commented 4 years ago

But don't you think in BO script this parameter has to be set to val by default (while HP search) as well as in run.py script?

KevinMusgrave commented 4 years ago

I don't think so. For my own purposes, I like to check accuracy on both the train and val set. In other words, during training, I get to see the accuracy on the train and val set, and then at the very end of bayesian optimization, I see the performance of the best model on the test set. (The best model is chosen based on val set accuracy.)

AlexFridman commented 4 years ago

Got it. Thank you, Kevin!

AlexFridman commented 4 years ago

Hi, @KevinMusgrave!

Two issues found:

  1. If we use the predefined scheme we're getting an error BaseApiParser does not have attribute meta_record_keeper here because here meta_record_keeper was not set because of self.split_manager.split_scheme_names contains only predefined.
  2. After running BO when it evaluates on test (num_trials=3, num_epochs=4, save_interval=1) I see the following saved records (inside meta_eval)
    defaultdict(<class 'list'>, {'epoch': [-1, 0], 'NMI_level0': [0.6870503826879762, 0.6299021740129578], 'precision_at_1_level0': [0.9142857142857143,
    0.8285714285714286], 'r_precision_level0': [0.7428571428571429, 0.7571428571428571], 'mean_average_r_precision_level0': [0.7136904761904762, 0.7041
    666666666666], 'best_epoch': [-1, -1], 'best_accuracy': [0.7136904761904762, 0.7136904761904762]})

    Could you please explain how epoch and best_epoch are formed? Why -1 and why only 2 records? Regards, Alex

KevinMusgrave commented 4 years ago
  1. If you're using "predefined", then cross validation isn't supported. (Sorry, I probably should have mentioned that earlier. Unfortunately I haven't put this functionality in yet.) So for example if your predefined split is train/val/test, then there is only 1 validation set. But all the "meta" stuff is for collecting models from multiple cross-validation folds. Since there is only one validation set when you use "predefined", the "meta" stuff is not applicable, so you should set the "meta_testing_method" flag to null. I think it should work then. You'll still get optimization_plot.html and best_parameters.yaml, and the test set performance will be in /predefined/saved_pkls.

  2. The "epoch" key in that meta log is a misnomer. It should be something like "evaluation iteration". Every time you run meta evaluation, it will just append an incremented value to that list. Most likely, you'll run meta evaluation once, so the list will be [-1, 0], where -1 refers to the untrained model, and 0 refers to the most recent evaluation. (The most recent evaluation always uses the trunk_best and embedder_best models saved in each sub-experiment.) Anyway, at the moment, if you're using predefined, then the meta eval stuff won't be applicable.

KevinMusgrave commented 4 years ago

@AlexFridman Were you able to get it working with "predefined"?