Open jbingel opened 6 years ago
We didn't report any single-task results, this could still be done and also be interesting
@bjerva can you tell me the hyperparameters for the systems we ended up submitting (the ones that you trained)?
Sure, lemme dig that up
These should be the best ones of my runs:
FR Reg: aux_task_weight 0.3 batch_size 64 binary False binary_vote_threshold None concatenate_train_data True dev_lang es dropout 0.33 exp_dir ../experiments/es/final-fr-1/ exp_name final-fr-1 hidden_layers [20, 20] lang_id_weight 0.5 lr 0.003 max_epochs 200 model_dir ../experiments/es/final-fr-1/models official_dev False patience 15 random_forest [100, 100, 100] restarts 10 scale_features True share_input True test_lang fr train_langs ['en', 'de', 'es']
DE Clf: aux_task_weight 0.3 batch_size 64 binary True binary_vote_threshold None concatenate_train_data False dev_lang de dropout 0.33 exp_dir ../experiments/de/final-de-1/ exp_name final-de-1 hidden_layers [20, 20] lang_id_weight 0.5 lr 0.003 max_epochs 200 model_dir ../experiments/de/final-de-1/models official_dev False patience 15 random_forest [100, 100, 100] restarts 10 scale_features True share_input False test_lang de train_langs ['en', 'de', 'es']
Round 1 (de), F1=0.7218, rank corr=0.5663 Round 2 (de), F1=0.7314, rank corr=0.5520 Round 3 (de), F1=0.7228, rank corr=0.5523 Round 4 (de), F1=0.7358, rank corr=0.5662 Round 5 (de), F1=0.7307, rank corr=0.5672 Round 6 (de), F1=0.7418, rank corr=0.5595 Round 7 (de), F1=0.7239, rank corr=0.5489 Round 8 (de), F1=0.6930, rank corr=0.5551 Round 9 (de), F1=0.7391, rank corr=0.5623 Round 10 (de), F1=0.7263, rank corr=0.5453 Random forest performance: 0.7004524886877829 Random forest performance: 0.7075812274368231 Random forest performance: 0.6975476839237057 Best threshold is 0.30000000000000004 Final P: 0.7347, R: 0.7518 Final F1 (de): 0.7432 (round mean: 0.7209, min: 0.6930, max: 0.7418)
ES Clf: aux_task_weight 0.5 batch_size 64 binary True binary_vote_threshold None concatenate_train_data True dev_lang es dropout 0.33 exp_dir ../experiments/es/final-es-4/ exp_name final-es-4 hidden_layers [20, 20] lang_id_weight 0.5 lr 0.003 max_epochs 200 model_dir ../experiments/es/final-es-4/models official_dev False patience 15 random_forest [100, 100, 100] restarts 10 scale_features True share_input True test_lang es train_langs ['es']
Round 1 (es), F1=0.6942, rank corr=0.5059 Round 2 (es), F1=0.6931, rank corr=0.4924 Round 3 (es), F1=0.6899, rank corr=0.4884 Round 4 (es), F1=0.6926, rank corr=0.4982 Round 5 (es), F1=0.6860, rank corr=0.4808 Round 6 (es), F1=0.6913, rank corr=0.4887 Round 7 (es), F1=0.6898, rank corr=0.4873 Round 8 (es), F1=0.6885, rank corr=0.4885 Round 9 (es), F1=0.6904, rank corr=0.4885 Round 10 (es), F1=0.6929, rank corr=0.4904 Random forest performance: 0.6936114732724903 Random forest performance: 0.6990881458966566 Random forest performance: 0.6882836143536532 Best threshold is 0.8 Final P: 0.7941, R: 0.6268 Final F1 (es): 0.7006 (round mean: 0.6915, min: 0.6860, max: 0.6991)
thanks!
"Which models are contributing the most? feed forward NN or multiple random forest models? Is the MTL helping at all? Can you show the performance of the models in isolation?"
Not sure we can pull of the analysis at this point. I'm not sure about this particular task anymore, but in the gaze-misreadings paper with Maria (which used the same code) it seemed like NNs and RFs often complement each other very nicely.