JuliaML / MLDataPattern.jl

Utility package for subsetting, resampling, iteration, and partitioning of various types of data sets in Machine Learning
http://mldatapatternjl.readthedocs.io/
Other
61 stars 14 forks source link

RandomBatches performance #15

Closed mihirparadkar closed 7 years ago

mihirparadkar commented 7 years ago

I've been implementing a minibatch gradient descent algorithm using MLDataPattern to supply minibatches from training data. However, computing the gradient from the returned SubArrays is very slow because matrix multiplication doesn't dispatch to a BLAS call in these cases.

I sped up the mini-batch iterations in my own code by copying the SubArrays to a regular array and doing calculations on that, resulting in an 8x speedup. However, this speed would be further improved if the random indices were sorted, since that would give the SubArray much more predictable properties than unsorted indices.

Perhaps the iterator could also return a pre-allocated Array and update its entries at every step, instead of returning SubArrays which have worse performance characteristics.

Evizero commented 7 years ago

Try BufferGetObs. It should do what you describe.

mihirparadkar commented 7 years ago

Thanks! I'll try that.

Evizero commented 7 years ago

If you find anything that bothers you or seems odd/inconvenient please do follow up. I know a lot of this package's design is unconventional and I am still trying to streamline things.