Kyubyong / wordvectors

Pre-trained word vectors of 30+ languages
MIT License
2.22k stars 393 forks source link

Unable to get most similar word #20

Closed jumerckx closed 6 years ago

jumerckx commented 6 years ago

I've downloaded the French word2vec embeddings and parsed the .tsv file to use in julialang. When I implement a function to find the most similar word to a given word (using cosine similarity). I don't get the right result. Chances are my code is wrong, since I'm a complete beginner, but I wanted to check if someone has been able to get this working?

Here is my julia code:

function similarWord(A::String)

    similarWord = nothing
    distance = 1000

    haskey(embeddings, A) ? A = embeddings[A] : throw("unknown word")

    for word in embeddings
        B = word[2]
        new_distance = (A'B)/(norm(A, 2)*norm(B, 2))
        if new_distance < distance
            distance = new_distance
            similarWord = word[1]
        end
    end
    return(similarWord, distance)
end

example: similarWord("ville") returns ("commentant", -0.3068699573567243) "ville" means "city", while "commentant" means "commenting"

Thanks in advance,

Jules

jumerckx commented 6 years ago

My bad... my code is horribly wrong. After I changed it, it works well.

function similarWord(A::String)

    mostSimilar = nothing
    max_sim = -1000

    haskey(embeddings, A) ? A_emb = embeddings[A] : throw("unknown word")

    for word in embeddings
        if A==word[1] continue end

        B = word[2]
        sim = (A_emb'B)/(norm(A_emb, 2)*norm(B, 2))
        if sim > max_sim
            max_sim = sim
            mostSimilar = word[1]
        end
    end
    return(mostSimilar, max_sim)
end