This is more of a discussion piece than something I want to merge.
The problem:
When working with large datasets the models become large and all the copying resulting from working with immutable models become a bottleneck.
The solution:
Split models into a MutableModel with the batchAdd method and an immutable Model class with the predict method. The usage would be:
val mutable = MutableModel()
mutable.batchAdd(data)
...
val model = mutable.toModel()
Since the model itself is basically a container for Features, this means that the features also have to have a mutable and an immutable implementation.
There's also another improvement which is updating the features in parallel. This speed up the training quite a lot if there's more than one feaeture. This is not dependent on the mutable model/features though.
Let me know what you think @dadib @lre @florianlaws @liufuyang
This is more of a discussion piece than something I want to merge.
The problem: When working with large datasets the models become large and all the copying resulting from working with immutable models become a bottleneck.
The solution:
Split models into a
MutableModel
with thebatchAdd
method and an immutableModel
class with thepredict
method. The usage would be:Since the model itself is basically a container for Features, this means that the features also have to have a mutable and an immutable implementation.
There's also another improvement which is updating the features in parallel. This speed up the training quite a lot if there's more than one feaeture. This is not dependent on the mutable model/features though.
Let me know what you think @dadib @lre @florianlaws @liufuyang