JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
373 stars 95 forks source link

`cos_similarity` should return `Symmetric` matrix #283

Open prbzrg opened 2 weeks ago

rssdev10 commented 1 week ago

Hi, there is no way to change the return value produced by this function without breaking comparability. But it is possible to create another one. https://github.com/JuliaText/TextAnalysis.jl/blob/master/src/tf_idf.jl#L329

At the same time, it would be good to check what the performance difference will be if you want to change this implementation for symmetric. Also, what will happen to the number of memory allocations and total memory used.

function cos_similarity(tfm::AbstractMatrix)
    cs = tfm * tfm'
    d = sqrt.(diag(cs))
    # prevent division by zero  (only occurs for empty documents)
    d[findall(iszero, d)] .= 1
    cs ./ (d * d')
end