k-shot consistency C score is different from what mentioned in paper

soroushjavdan commented 5 years ago

First, I would like to thanks for your contribution. I trained your model exactly like what you said in document.But I got Entl_b = 0.0879 in printed result , I checked and found out that is the C score ( am i right ? ) ! The problem is that in paper C score has been reported equal to 0.2 . By the way the Entl_b that I mention before was for checkpoint with loss of 46.5833 ( last checkpoint ) Thanks in advance

zlinao commented 5 years ago

Hi, thanks for interested in our work. After the finetune, the result file will save the results for different iteration, normally the Entl_b increase w.r.t iteration, and the result of 10th iteration should be close to 0.2.

soroushjavdan commented 5 years ago

Thanks for your reply This the result that i got after finetuning . epoch loss Peplexity Entl_b Bleu_b 0 3.8042 44.8887 -0.1415 0.4747 1 3.7168 41.1305 -0.1400 0.5077 3 3.6794 39.6218 -0.0622 0.5583 5 3.6779 39.5636 -0.0066 0.5623 7 3.6881 39.9686 0.0450 0.5607 10 3.7241 41.4353 0.0879 0.6649

I used the exact command that you specified in the readme file only without --cuda ( it shouldn't have any effect on the result right ? ) but the Entl_b is not close to 0.2.

zlinao commented 5 years ago

Yes, it shouldn't affect the result. But have you checked other baselines? because in our experiments, the Entl_b before fine-tune is always start around 0, then linear increase with the epoch number.

soroushjavdan commented 5 years ago

python MAML.py --cuda --model trs --batch_size 16 --use_sgd --lr 0.01 --meta_lr 0.0003 --meta_batch_size 16 --meta_optimizer adam --pretrain_emb --weight_sharing --emb_dim 300 --hidden_dim 300 --fix_dialnum_train --pointer_gen --save_path save/paml/ This was the command in the document, and I added the --universal and the result boosts a little bit. Is there any other parameter that I should add to get the result that mentions in the paper?

the parameters that used for training is : act=False, act_loss_weight=0.001, batch_size=16, beam_size=5, cuda=True, depth=40, emb_dim=300, filter=50, fix_dialnum_train=True, heads=4, hidden_dim=300, hop=6, is_coverage=False, k_shot=20, label_smoothing=False, load_frompretrain='None', lr=0.01, mate_interation=1, max_dec_steps=20, max_enc_steps=400, max_grad_norm=2.0, meta_batch_size=16, meta_lr=0.0003, meta_optimizer='adam', min_dec_steps=5, model='trs', noam=False, persona=False, pointer_gen=True, pretrain_emb=True, save_path='save/paml/', save_path_dataset='save/', test=False, universal=True, use_oov_emb=False, use_sgd=True, weight_sharing=True

epoch loss Peplexity Entl_b Bleu_b 0 3.7992 44.6648 -0.1142 0.7208 1 3.7202 41.2723 -0.0311 0.6597 3 3.6961 40.2890 0.0441 0.6779 5 3.6993 40.4186 0.0661 0.7221 7 3.7138 41.0085 0.1160 0.6755 10 3.7484 42.4548 0.1330 0.7584

Thanks again for your time

soroushjavdan commented 5 years ago

I checked other baselines and the results was quite strange! for this command python main.py --cuda --model trs --pretrain_emb --weight_sharing --label_smoothing --noam --emb_dim 300 --hidden_dim 300 --pointer_gen --save_path save/no_persona/ (no_persona) i got :

epoch loss Peplexity Entl_b Bleu_b 0 3.4054 30.1277 0.0586 1.3500 1 3.3736 29.1847 0.1005 1.3565 3 3.3618 28.8399 0.1364 1.5074 5 3.3570 28.7043 0.1427 1.4774 7 3.3550 28.6447 0.1515 1.4446 10 3.3549 28.6435 0.1580 1.3977

And for this command python main.py --cuda --model trs --pretrain_emb --weight_sharing --label_smoothing --noam --emb_dim 300 --hidden_dim 300 --pointer_gen --persona --save_path save/persona/ (persona) i got :

epoch loss Peplexity Entl_b Bleu_b 0 3.5966 36.4736 -0.0332 0.9343 1 3.5605 35.1796 -0.0295 0.9113 3 3.5335 34.2427 -0.0180 0.8793 5 3.5187 33.7421 -0.0139 0.9127 7 3.5090 33.4142 -0.0026 0.9556 10 3.5006 33.1345 0.0126 0.8601

Also, I didn't change the code. These results achieved by running the exact code in your repo.

Guaguago commented 3 years ago

Hello, what's the "Entl_b" full name? Why don't just name it "c_score" in the code as the paper?

xiuzbl commented 3 years ago

I also cannot reproduce the scores in the paper based on the code in this GitHub.

HLTCHKUST / PAML

k-shot consistency C score is different from what mentioned in paper #5