Open de-code opened 6 years ago
I think you need a large dataset to train the vectors, otherwise you will not get anything meaningful. I was thinking of training the vectors using all the papers in PUBMED central (just using the sentences that cite another paper). However I don't think this is that important at the moment. I don't have any overfitting. The problem is how to reduce all the vectors associated to one sentence to just one, allowing to train a classifier. At the moment I'm getting very poor results. I think the problem is on the method used to reducing these vectors.
Yes, you are probably right. I would look into this later. Might be good to look into this once you are more confident with the method.
Since our training set is so small, I could also imagine that smaller word vectors would make sense. It would also make the model faster but we would obviously need the smaller word vectors first. One alternative to combat overfitting would be to use dropout.