Tradeshift / blayze

A fast and flexible Naive Bayes implementation for the JVM
MIT License
19 stars 11 forks source link

Specify prior pseudocounts per outcome #32

Closed dadib closed 4 years ago

dadib commented 4 years ago

This changes the outcome psuedoCount parameter from a scalar to a map. This lets us input our prior beliefs about the outcome class distribution. A limitation of the current approach of using a single scalar for pseudocount is that it is only applied to outcomes that have been previously observed in the training data. When the model hasn't received a lot of training data it will predict with high confidence the outcomes it has already seen, regardless of pseudocount.

It is common to know which classes exist even though they haven't been observed so it is useful to be able to use that knowledge for predictions.