which suggests that I am not using proper hyper-params. Do you think that explains it?
If so, I would appreciate more clarify on this sentence from your paper: "We emphasize that in all our experiments we use exactly the same training procedure for all datasets, with minimal hyper-parameter tuning." especially with respect to "minimal hyper-parameter tuning".
I have tried your code for multiple datasets:
Following by corresponding evaluation:
I am getting relatively bad scores (EM/F1):
which suggests that I am not using proper hyper-params. Do you think that explains it? If so, I would appreciate more clarify on this sentence from your paper: "We emphasize that in all our experiments we use exactly the same training procedure for all datasets, with minimal hyper-parameter tuning." especially with respect to "minimal hyper-parameter tuning".