Tradeshift / blayze

A fast and flexible Naive Bayes implementation for the JVM
MIT License
19 stars 11 forks source link

Add UCI Adult example #20

Closed liufuyang closed 2 years ago

liufuyang commented 5 years ago

The code gives acc score 0.830. I tried it with the old version, it gives 0.829 (And by changing the default pseudo count to 0.1 I got 0.8302929795467109 again)

Which might be a little weird, when comparing with my Rust implementation, I got 0.833 acc with an implementation which should be very close to the old version blayze, so I was expecting the score to be around 0.833 🤔 ...

liufuyang commented 5 years ago

image

To study the acc change with training size change. one can use the following test script.

@Test
    fun can_fit_uci_adult_dataset() {
        val train = uciAdult("adult.train.txt")
        val test = uciAdult("adult.test.txt")

        val index = arrayListOf(0, 5, 10, 15, 20, 25, 30, 40, 50, 60, 80, 100, 200, 500, 1000, 2000, 10000, train.size)
        var acc = 0.0
        var model = Model()

        for (i in 1 until index.size) {
            model = model.batchAdd(train.subList(index[i - 1], index[i]))
            acc = test
                    .parallelStream()
                    .map {
                        if (it.outcome == model.predict(it.inputs).maxBy { it.value }?.key) {
                            1.0
                        } else {
                            0.0
                        }
                    }
                    .toList()
                    .average()

            println(acc)
        }
        Assert.assertTrue("expected $acc > 0.83", acc > 0.83)
    }