Tradeshift / blayze

A fast and flexible Naive Bayes implementation for the JVM
MIT License
19 stars 11 forks source link

Add test to illustrate some issues #16

Closed liufuyang closed 5 years ago

liufuyang commented 5 years ago

I seem to find a few issues that I don't know if they are really matters. I will create two issues and separate the discussion on the issue page.

Please take a look at first: https://github.com/Tradeshift/blayze/issues/17

Then second: https://github.com/Tradeshift/blayze/issues/18

rasmusbergpalm commented 5 years ago

Fixed in #19

There were a bug in the gaussian feature implementation where if a gaussian had n<2 or 0 variance, then it'd return 0 log probability, which equals 1 probability. That has now been fixed.

Further using MLE estimates of Gaussians, led to overly confident parameter estimates, which could "overrule" other features. After the upgrade to bayesian naive bayes, the log probabilities should better reflect the uncertainty in the estimates.

If you still have issues, please re-open.