arkilpatel / SVAMP

NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?
MIT License
116 stars 34 forks source link

ASDiv-A Experiments results for GTS with paper #7

Open the-jb opened 2 years ago

the-jb commented 2 years ago

I run following scripts for testsing RoBERTa+GTS with best parameter settings on the paper (Table 19)

python -m src.main -mode train -gpu 0 -embedding roberta -emb_name roberta-base -embedding_size 768 -hidden_size 512 -depth 2 -lr 0.001 -emb_lr 8e-6 -batch_size 4 -epochs 50 -dataset cv_asdiv-a -full_cv -run_name run_cv_asdiv-a

and it shows below results on "out/CV_results_cv_asdiv-a.json" file.

{
    "run_cv_asdiv-a": {
        "run_name": "run_cv_asdiv-a",
        "5-fold avg acc score": "0.770747740345111",
        "Fold0 acc": 0.7815126050420168,
        "Fold1 acc": 0.8067226890756303,
        "Fold2 acc": 0.7773109243697479,
        "Fold3 acc": 0.7637130801687764,
        "Fold4 acc": 0.7293233082706767,
        "epochs": 50,
        "embedding": "roberta",
        "embedding_size": 768,
        "embedding_lr": 8e-06,
        "freeze_emb": false,
        "cell_type": "lstm",
        "hidden_size": 512,
        "depth": 2,
        "lr": 0.001,
        "batch_size": 4,
        "dropout": 0.5
    }
}

But, in your paper, the accuracy should be 81.2 (Table 2)

What am I missing for this? How can I get the 81.2 accuracy for ASDiv-A with GTS?