Open cbodenst opened 8 years ago
So an Algorithm implements a partial_fit
method and Dataset provides load_batch
for example?
Yes the dataset should provide a load_batch
method. I think a partial_fit
is not possible for all algorithms (For neural netwoks with sgd it should work for k-mean not) so the algorithm itself has to decide how to deal with the batches. Maybe the user could define the batch size as algorithm parameter.
By now, JuML just loads all the data onto the device and do not care if enough memory is left. It would be smarter to enable batched processing, where the algorithm pulls just a batch from the dataset that fits into the devices memory, to its computation, repeats this for all batches and finally performing a reduction.