Tradeshift / blayze

A fast and flexible Naive Bayes implementation for the JVM
MIT License
19 stars 11 forks source link

Mutable/Immutable model #14

Closed rasmusbergpalm closed 2 years ago

rasmusbergpalm commented 6 years ago

This is more of a discussion piece than something I want to merge.

The problem: When working with large datasets the models become large and all the copying resulting from working with immutable models become a bottleneck.

The solution:

Split models into a MutableModel with the batchAdd method and an immutable Model class with the predict method. The usage would be:

val mutable = MutableModel()
mutable.batchAdd(data)
...
val model = mutable.toModel()

Since the model itself is basically a container for Features, this means that the features also have to have a mutable and an immutable implementation.

There's also another improvement which is updating the features in parallel. This speed up the training quite a lot if there's more than one feaeture. This is not dependent on the mutable model/features though.

Let me know what you think @dadib @lre @florianlaws @liufuyang