Open AladarMiao opened 5 years ago
Hi, sorry for the delay. Could you please specify which number in the paper you would like to compare to, and whether you got a lower or a higher accuracy number?
Regarding our model architecture, it's a standard BiLSTM with dropout=0.2, hidden size = 256, activation = relu, using the first/last state vec of the forward/backward LSTM, and Glove embedding. What's your model configuration?
I am currently using a self trained embedding, BiLSTM, last state vec, concatenate, and dense as the last layer. If what you stated is the case, where does cosine similarity come in? I am comparing my model with what's stated on page 8 of the paper, where BiLSTM achieved a 86.3 acc and 91.6 auc.
Just to be more precise, we take the state at the last token for the forward LSTM, and the state at the first token for the backward LSTM. Concatenate the two states and add a dense layer to project them to the required dimension (256).
Thanks!
If I read the PAWS paper correctly, it stated that BiLSTM+cosine similarity is one of the baseline models that was used to evaluate the PAWS dataset. I tried to reenact the experiment with a BiLSTM+cosine similarity model I designed, but the accuracy is still quite far from the accuracy stated in the paper. Is there somewhere to see how you guys defined the BiLSTM+cosine similarity model? It would be really helpful on my current study regarding paraphrase identification. Thanks in advance!