Closed liufuyang closed 2 years ago
To study the acc change with training size change. one can use the following test script.
@Test
fun can_fit_uci_adult_dataset() {
val train = uciAdult("adult.train.txt")
val test = uciAdult("adult.test.txt")
val index = arrayListOf(0, 5, 10, 15, 20, 25, 30, 40, 50, 60, 80, 100, 200, 500, 1000, 2000, 10000, train.size)
var acc = 0.0
var model = Model()
for (i in 1 until index.size) {
model = model.batchAdd(train.subList(index[i - 1], index[i]))
acc = test
.parallelStream()
.map {
if (it.outcome == model.predict(it.inputs).maxBy { it.value }?.key) {
1.0
} else {
0.0
}
}
.toList()
.average()
println(acc)
}
Assert.assertTrue("expected $acc > 0.83", acc > 0.83)
}
The code gives acc score 0.830. I tried it with the old version, it gives 0.829 (And by changing the default pseudo count to 0.1 I got 0.8302929795467109 again)
Which might be a little weird, when comparing with my Rust implementation, I got 0.833 acc with an implementation which should be very close to the old version blayze, so I was expecting the score to be around 0.833 🤔 ...