Closed rdatasculptor closed 4 years ago
I think you can. Haven't done this myself but I think you can summate/subtract the different embeddings of the different words in order to achieve what you are looking for.
Thanks! I will give it a try somehow. Any chance of adding this as a feature to the predict function? Something like predict(model, text, exclude_docs=text, k=3)
Maybe I'm misunderstanding the question. Predict already has argument basedoc which allows a similar thing to limit the prediction to that set of docs you provide in basedoc only (maybe not in the trainmode setting however that you refer to)
I was aware of basedoc. I am just looking for a possibility to find similar documents while taking into account some dissimilar basedocs. I am sorry my question was not clear.
why not just filter these out from basedoc by finding similarity from the text in basedoc and other docs which you do not want in
I will figure some things out. Thanks!
In addition to the question I asked above I want to give an example from word2vec tutorials. What I meant to say was, is it possible to do something like France + Berlin - Germany = Paris on a sentence or article level with Ruimtehol?
Yes.
Ruimtehol works like a charm. I use it to find similar articles based on words or sentences as input in the predict function.
I was wondering, could it be possible, or made possible, to not only find similarity, but also find similarity by taking into account the dissimilarity of certain words? E.g. find articles that are close to word together with a large distance to word2?