JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
374 stars 96 forks source link

Proposing API tweak for sentiment analysis #122

Closed aquatiko closed 5 years ago

aquatiko commented 5 years ago

Regarding #83 Some changes to API will be: https://github.com/JuliaText/TextAnalysis.jl/blob/bb81bc93b4210b0c04f212e5c5040a63c30e0612/src/sentiment.jl#L81 to

function(m::SentimentAnalyzer)(d::AbstractDocument, default_dict::Bool=True)
    m.model(tokens(d), default_dict)
end

and https://github.com/JuliaText/TextAnalysis.jl/blob/bb81bc93b4210b0c04f212e5c5040a63c30e0612/src/sentiment.jl#L38 to

function get_sentiment(ip::Array{T, 1}, weight, rwi, default_dict) where T <: AbstractString
    model = (x,) -> begin
    a_1 = embedding(weight[:embedding_1]["embedding_1"]["embeddings:0"], x)
    a_2 = flatten(a_1)
    a_3 = Flux.Dense(weight[:dense_1]["dense_1"]["kernel:0"], weight[:dense_1]["dense_1"]["bias:0"], Flux.relu)(a_2)
    a_4 = Flux.Dense(weight[:dense_2]["dense_2"]["kernel:0"], weight[:dense_2]["dense_2"]["bias:0"], Flux.sigmoid)(a_3)
    return a_4
    end
    res = Array{Int, 1}()
    skipped = Array{String,1}()
    for ele in ip
        if default_dict && !(ele in keys(rwi)) 
            push!(skipped,ele)
        else
            push!(res, rwi[ele])
        end
    end

    if default_dict
        println("Skipped words(Not present in dictionary):", skipped))
    end
    return model(pad_sequences(res))[1]
end

Haven't tested the whole integration yet, just shows the major parts.

aviks commented 5 years ago

On slack, suggestion from @oxinabox which is reasonable:

Instead of having a keyword argument as a dict, make it a function that returns an iterator. The default value of that function would be (x)->[] (ie, and empty array). In the get_sentiment function, when we encounter a missing word, call this function, and add whatever it returns to res.

Also, do not print anything in this function --- it can get very annoying to see log outputs like this from functions which work on user supplied data.