Add model checkpointing to ReutersTrainer

achyudh commented 5 years ago

Performance

RCV-1

	Accuracy	Avg. Precision	Avg. Recall	Avg. F1	BCE Loss
Before checkpointing
BiLSTM with Hidden Bottleneck Layer (Dev)	0.802	0.927	0.815	0.867	1.058
BiLSTM with Hidden Bottleneck Layer (Test)	0.783	0.921	0.780	0.845	1.252
After checkpointing
BiLSTM with Hidden Bottleneck Layer (Dev)	0.813	0.929	0.817	0.870	1.299
BiLSTM with Hidden Bottleneck Layer (Test)	0.789	0.915	0.781	0.843	1.670

Note: I am currently working on replacing RCV-1 dataset with the 103-class Lewis split, and hence the following results are on the 90-class ModApte split. I ran the models for 30 epochs.

AAPD

	Accuracy	Avg. Precision	Avg. Recall	Avg. F1	BCE Loss
Before checkpointing
BiLSTM with Hidden Bottleneck Layer (Dev)	0.381	0.777	0.630	0.696	5.435
BiLSTM with Hidden Bottleneck Layer (Test)	0.363	0.773	0.611	0.683	5.622
After checkpointing
BiLSTM with Hidden Bottleneck Layer (Dev)	0.391	0.812	0.636	0.714	3.699
BiLSTM with Hidden Bottleneck Layer (Test)	0.359	0.811	0.610	0.697	3.792

TODO: Verify performance of LSTM with Regularization.

Ashutosh-Adhikari commented 5 years ago

@daemon can you review this, with respect to what we discussed on slack for checkpointing?

achyudh commented 5 years ago

Added the performance metrics for AAPD. Minor improvements (1 percentage point) on LSTM Baseline.

achyudh commented 5 years ago

@daemon I was working on more changes based on your suggestions, such as checkpointing only at the end of an epoch. Should I just create a separate PR for those?

castorini / castor