Im am trying to reproduce the experiment for my thesis and i am having a hard time getting the same numbers.
In your paper, you report: "In all experiments ... the numbers are averaged over 20 runs".
But based on the output file:
...
2020-03-09 19:52:52,220 : INFO : Epochs: 19
2020-03-09 19:52:52,220 : INFO : Loss: 0.02750968653033072
2020-03-09 19:52:53,346 : INFO : Valid:
2020-03-09 19:52:54,032 : INFO : slot f1: 97.74259747874524
2020-03-09 19:52:54,033 : INFO : intent accuracy: 96.6
2020-03-09 19:52:54,033 : INFO : semantic error(intent, slots are all correct): 89.60000000000001
2020-03-09 19:52:54,033 : INFO : Test:
2020-03-09 19:52:54,948 : INFO : slot f1: 95.40960451977402
2020-03-09 19:52:54,948 : INFO : intent accuracy: 95.63269876819709
2020-03-09 19:52:54,948 : INFO : semantic error(intent, slots are all correct): 84.5464725643897
2020-03-09 19:53:11,712 : INFO : Step: 5600
2020-03-09 19:53:11,714 : INFO : Epochs: 20
2020-03-09 19:53:11,714 : INFO : Loss: 0.027824909109663818
2020-03-09 19:53:12,100 : INFO : Valid:
2020-03-09 19:53:12,613 : INFO : slot f1: 97.48538011695906
2020-03-09 19:53:12,613 : INFO : intent accuracy: 97.39999999999999
2020-03-09 19:53:12,613 : INFO : semantic error(intent, slots are all correct): 89.8
2020-03-09 19:53:12,613 : INFO : Test:
2020-03-09 19:53:13,270 : INFO : slot f1: 95.26501766784452
2020-03-09 19:53:13,270 : INFO : intent accuracy: 95.40873460246361
2020-03-09 19:53:13,270 : INFO : semantic error(intent, slots are all correct): 83.87458006718926
i am confused which is the representative set of numbers for 1 run.
Since an early-stop strategy is applied, do i understand correctly that the representative result is indeed the last output (marked in bold) which are then averaged over 20 runs ?
I would appreciate if somebody could kindly clarify.
Im am trying to reproduce the experiment for my thesis and i am having a hard time getting the same numbers.
In your paper, you report: "In all experiments ... the numbers are averaged over 20 runs". But based on the output file:
i am confused which is the representative set of numbers for 1 run. Since an early-stop strategy is applied, do i understand correctly that the representative result is indeed the last output (marked in bold) which are then averaged over 20 runs ?
I would appreciate if somebody could kindly clarify.
Thanks!