Is it possible to exclude similarity of e.g. sentences when predicting?

bnosac / ruimtehol

R package to Embed All the Things! using StarSpace

Mozilla Public License 2.0

99 stars 13 forks source link

Is it possible to exclude similarity of e.g. sentences when predicting? #25

Closed rdatasculptor closed 4 years ago

rdatasculptor commented 4 years ago

Ruimtehol works like a charm. I use it to find similar articles based on words or sentences as input in the predict function.

I was wondering, could it be possible, or made possible, to not only find similarity, but also find similarity by taking into account the dissimilarity of certain words? E.g. find articles that are close to word together with a large distance to word2?

jwijffels commented 4 years ago

I think you can. Haven't done this myself but I think you can summate/subtract the different embeddings of the different words in order to achieve what you are looking for.

rdatasculptor commented 4 years ago

Thanks! I will give it a try somehow. Any chance of adding this as a feature to the predict function? Something like predict(model, text, exclude_docs=text, k=3)

jwijffels commented 4 years ago

Maybe I'm misunderstanding the question. Predict already has argument basedoc which allows a similar thing to limit the prediction to that set of docs you provide in basedoc only (maybe not in the trainmode setting however that you refer to)

rdatasculptor commented 4 years ago

I was aware of basedoc. I am just looking for a possibility to find similar documents while taking into account some dissimilar basedocs. I am sorry my question was not clear.

jwijffels commented 4 years ago

why not just filter these out from basedoc by finding similarity from the text in basedoc and other docs which you do not want in

rdatasculptor commented 4 years ago

I will figure some things out. Thanks!

rdatasculptor commented 4 years ago

In addition to the question I asked above I want to give an example from word2vec tutorials. What I meant to say was, is it possible to do something like France + Berlin - Germany = Paris on a sentence or article level with Ruimtehol?

jwijffels commented 4 years ago

Yes.