dfdx / NaiveBayes.jl

Naive Bayes classifier
MIT License
25 stars 19 forks source link

Bug in MultinomialNB? Incorrect calculation of P(x|c). #42

Closed ablaom closed 3 years ago

ablaom commented 5 years ago

I am struggling to understand the code below from NaiveBayes.jl/src/multinomial.jl. It seems to me that the denominator in x_priors_for_c = m.x_counts[c] ./ m.x_totals is incorrect. Rather, the denominator should be sum(m.x_counts[c]), no?

Since the test code for MultinomialNB is not very thorough, this could easily have been missed.

@dfdx Assuming you agree this is a bug, if I create a PR with a correction, plus rigorous test, would you consider reviewing in near future?

"""Calculate log P(x|C)"""
function logprob_x_given_c(m::MultinomialNB, x::Vector{Int64}, c::C) where C
    x_priors_for_c = m.x_counts[c] ./ m.x_totals
    x_probs_given_c = x_priors_for_c .^ x
    logprob = sum(log(x_probs_given_c))
    return logprob
end
dfdx commented 5 years ago

Yes, this package was created in prehistoric times and has never been seriously reviewed since then, so it's pretty possible it contains even trivial bugs.

I might have had something different in my mind at the time of writing, but looking at the code now I believe you are right. This may also be the reason for #40.

If you post a PR fixing this I'll try to review it within 12 hours.

ablaom commented 4 years ago

Okay, I'm going to try and get a PR for this during the next week.