Closed lintool closed 6 years ago
Baseline results: Logistic regression with tf-idf - 71.13% accuracy
Was this with Mallet or PyTorch? Let's check in the code for this? Maybe place at top-level directory.
What's the fair comparison to your RNN? What about LR using word embeddings?
This was using Python with sci-kit. There is an iPython notebook there that you can play with. Also, I just trained on the train set and tested on the test set. I did not even use the dev set for these baselines. Look at 'relation_prediction/baselines.ipynb' (the models take some time to train)
Doesn't have to be right now, but can you check in an actual Python version as a nicely module with nice interfaces? It would be nice if we could build a modular system that has the same interface, but different implementations (LR, CNN, RNN, etc.) so we can plug-and-play with experiments.
Logistic Regression with averaged word embeddings + top 300 relation words Accuracy on test set: 71.34%
Hmm, so not any better than tf-idf, huh?
Logistic Regression with tf-idf on 1-gram + 2-gram: Accuracy on test set: 72.645%
Logistic Regression with tf-idf on 1-gram + 2-gram + 3-gram: Accuracy on test set: 71.253%
Relation prediction results comparison: LR results on test set > dev/valid set because of more data. I used (train+dev) on test set, since there were no parameters to tune on dev set
Let's run a logistic regression baseline for relation prediction. We can either use something like Mallet, or I suppose we can use one-hot vectors with a softmax?