Closed lileicv closed 5 years ago
Hi lileicv,
We had tried some parameters (but we did not do an hyperparameters fine-tuning) and this ones, for our implementation, seemed to work pretty well. It is also important to note that in this implementation we have removed the pre-training, using GloVe as word embedding. Thus it is possible that our hyperparameters are different compared to the paper's ones.
Enrico & Aurel
The parameters you set are different from the original papers. Can you explain why?
For example, the p_mult is 5.0 in the paper. It is 0.02 in your code.