fix mlp did not reporting the best

This fix MLP did not report the best metrics at the end of training #130 . Before PR:

EBUG runmanager: run 0                                                         
DEBUG model: value for loss not found, using default: cross                     
DEBUG model: using model with cross entropy loss function                       
DEBUG model: value for device not found, using default: cuda                    
DEBUG model: using layers {'l1':128, 'l2':64}                                   
DEBUG model: value for learning_rate not found, using default: 0.0001           
DEBUG model: value for num_workers not found, using default: 5                  
DEBUG modelrunner: run: 0 epoch: 0: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 1: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 2: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 3: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 4: result: test: 0.503 UAR                     
DEBUG modelrunner: run: 0 epoch: 5: result: test: 0.509 UAR                     
DEBUG modelrunner: run: 0 epoch: 6: result: test: 0.516 UAR                     
DEBUG modelrunner: run: 0 epoch: 7: result: test: 0.518 UAR                     
DEBUG modelrunner: run: 0 epoch: 8: result: test: 0.521 UAR                     
DEBUG modelrunner: run: 0 epoch: 9: result: test: 0.520 UAR                     
DEBUG modelrunner: run: 0 epoch: 10: result: test: 0.523 UAR                    
DEBUG modelrunner: run: 0 epoch: 11: result: test: 0.522 UAR                    
DEBUG modelrunner: run: 0 epoch: 12: result: test: 0.526 UAR                    
DEBUG modelrunner: run: 0 epoch: 13: result: test: 0.523 UAR                    
DEBUG modelrunner: run: 0 epoch: 14: result: test: 0.521 UAR                    
DEBUG modelrunner: run: 0 epoch: 15: result: test: 0.518 UAR                    
DEBUG modelrunner: run: 0 epoch: 16: result: test: 0.523 UAR                    
DEBUG modelrunner: run: 0 epoch: 17: result: test: 0.526 UAR                    
DEBUG modelrunner: run: 0 epoch: 18: result: test: 0.522 UAR                    
DEBUG modelrunner: run: 0 epoch: 19: result: test: 0.521 UAR                    
DEBUG modelrunner: plotting confusion matrix to train_dev_mlp_os_64-128_scale-st
andard_0_019_cnf                                                                
DEBUG reporter: epoch: 19, UAR: .52, (+-.508/.534), ACC: .961                   
DEBUG reporter: labels: [0, 1]                                                  
DEBUG reporter: auc: 0.521, pauc: 0.520                                         
DEBUG reporter: result per class (F1 score): [0.981, 0.073]

After this PR, all metrics report using the best model at specific epoch, not the last.

DEBUG modelrunner: run: 0 epoch: 0: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 1: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 2: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 3: result: test: 0.500 UAR                     
DEBUG modelrunner: run: 0 epoch: 4: result: test: 0.502 UAR                     
DEBUG modelrunner: run: 0 epoch: 5: result: test: 0.504 UAR                     
DEBUG modelrunner: run: 0 epoch: 6: result: test: 0.513 UAR                     
DEBUG modelrunner: run: 0 epoch: 7: result: test: 0.517 UAR                     
DEBUG modelrunner: run: 0 epoch: 8: result: test: 0.522 UAR                     
DEBUG modelrunner: run: 0 epoch: 9: result: test: 0.521 UAR                     
DEBUG modelrunner: run: 0 epoch: 10: result: test: 0.525 UAR                    
DEBUG modelrunner: run: 0 epoch: 11: result: test: 0.530 UAR                    
DEBUG modelrunner: run: 0 epoch: 12: result: test: 0.526 UAR                    
DEBUG modelrunner: run: 0 epoch: 13: result: test: 0.529 UAR                    
DEBUG modelrunner: run: 0 epoch: 14: result: test: 0.531 UAR                    
DEBUG modelrunner: run: 0 epoch: 15: result: test: 0.530 UAR                    
DEBUG modelrunner: run: 0 epoch: 16: result: test: 0.530 UAR                    
DEBUG modelrunner: run: 0 epoch: 17: result: test: 0.535 UAR                    
DEBUG modelrunner: run: 0 epoch: 18: result: test: 0.529 UAR                    
DEBUG modelrunner: run: 0 epoch: 19: result: test: 0.528 UAR                    
DEBUG modelrunner: plotting confusion matrix to train_dev_mlp_os_64-128_scale-st
andard_0_019_cnf                                                                
DEBUG reporter: Best score at epoch: 17, UAR: .534, (+-.52/.55), ACC: .965      
DEBUG reporter: labels: [0, 1]                                                  
DEBUG reporter: auc: 0.535, pauc: 0.534 from epoch: 17                          
DEBUG reporter: result per class (F1 score): [0.982, 0.114] from epoch: 17      
WARNING experiment: Save experiment: Can't pickle the trained model so saving wi
thout it. (it should be stored anyway)                                          
DEBUG experiment: Done, used 100.017 seconds                                    
DONE

For finetuning, there is no need to find the best epoch since the argument load_best_model_at_end is given in training args.

felixbur / nkululeko

fix mlp did not reporting the best #131