castorini / castor

PyTorch deep learning models for text processing
http://castor.ai/
Apache License 2.0
178 stars 58 forks source link

Add model checkpointing to ReutersTrainer #158

Closed achyudh closed 5 years ago

achyudh commented 5 years ago

Performance

RCV-1

  Accuracy Avg. Precision Avg. Recall Avg. F1 BCE Loss
Before checkpointing          
BiLSTM with Hidden Bottleneck Layer (Dev) 0.802 0.927 0.815 0.867 1.058
BiLSTM with Hidden Bottleneck Layer (Test) 0.783 0.921 0.780 0.845 1.252
After checkpointing          
BiLSTM with Hidden Bottleneck Layer (Dev) 0.813 0.929 0.817 0.870 1.299
BiLSTM with Hidden Bottleneck Layer (Test) 0.789 0.915 0.781 0.843 1.670

Note: I am currently working on replacing RCV-1 dataset with the 103-class Lewis split, and hence the following results are on the 90-class ModApte split. I ran the models for 30 epochs.

AAPD

  Accuracy Avg. Precision Avg. Recall Avg. F1 BCE Loss
Before checkpointing          
BiLSTM with Hidden Bottleneck Layer (Dev) 0.381 0.777 0.630 0.696 5.435
BiLSTM with Hidden Bottleneck Layer (Test) 0.363 0.773 0.611 0.683 5.622
After checkpointing          
BiLSTM with Hidden Bottleneck Layer (Dev) 0.391 0.812 0.636 0.714 3.699
BiLSTM with Hidden Bottleneck Layer (Test) 0.359 0.811 0.610 0.697 3.792

TODO: Verify performance of LSTM with Regularization.

Ashutosh-Adhikari commented 5 years ago

@daemon can you review this, with respect to what we discussed on slack for checkpointing?

achyudh commented 5 years ago

Added the performance metrics for AAPD. Minor improvements (1 percentage point) on LSTM Baseline.

achyudh commented 5 years ago

@daemon I was working on more changes based on your suggestions, such as checkpointing only at the end of an epoch. Should I just create a separate PR for those?