When I train the model with conll 2012 data, the dev conll score is 0.657888797045302 at the first epoch. And remain stable around 0.658. This is even better than your result 0.657.
The dev conll score of the first 10 epochs:
(0, 0.657888797045302)
(1, 0.6585094296994586)
(2, 0.6582570233258541)
(3, 0.6582991986114178)
(4, 0.6590045611564873)
(5, 0.6580189901300686)
(6, 0.6585166713504456)
(7, 0.6574807587739219)
(8, 0.660184576001444)
(9, 0.6592850994459925)
(10, 0.6583200324261912)
What I did was just following the 4 steps in the instruction. Do you have any idea how does this happen?
When I train the model with conll 2012 data, the dev conll score is 0.657888797045302 at the first epoch. And remain stable around 0.658. This is even better than your result 0.657.
The dev conll score of the first 10 epochs: (0, 0.657888797045302) (1, 0.6585094296994586) (2, 0.6582570233258541) (3, 0.6582991986114178) (4, 0.6590045611564873) (5, 0.6580189901300686) (6, 0.6585166713504456) (7, 0.6574807587739219) (8, 0.660184576001444) (9, 0.6592850994459925) (10, 0.6583200324261912)
What I did was just following the 4 steps in the instruction. Do you have any idea how does this happen?