Implement the base trainer, loss, and metrics

Now I refactored the CAML model. Some util methods still remain in src/utils/caml_utils.py.
The current performances of CNN/CAML/DR-CAML on the MIMIC-III top-50 dataset are as below. It looks far better than the original CAML paper, and I guess this is because we have much more examples of this dataset. (+we have a different set of top-50 codes)

Vanilla CNN

Checkpoint loaded from best-1.pth
Evaluate on test dataset
   prec_at_5: 0.648547
   prec_at_8: 0.527984
    macro_f1: 0.635654
    micro_f1: 0.689384
   macro_auc: 0.913144
   micro_auc: 0.936200
Save result on results/CNN_mimic3_50/test_result.json

CAML

Checkpoint loaded from best-22.pth
Evaluate on test dataset
   prec_at_5: 0.651824
   prec_at_8: 0.533704
    macro_f1: 0.615738
    micro_f1: 0.667918
   macro_auc: 0.914351
   micro_auc: 0.940175
Save result on results/CAML_mimic3_50/test_result.json

DR-CAML

Checkpoint loaded from best-23.pth
Evaluate on test dataset
   prec_at_5: 0.651144
   prec_at_8: 0.532854
    macro_f1: 0.628317
    micro_f1: 0.672664
   macro_auc: 0.914375
   micro_auc: 0.940238
Save result on results/DRCAML_mimic3_50/test_result.json

dalgu90 / icd-coding-benchmark

Implement the base trainer, loss, and metrics #17