dmlc / MXNet.jl

MXNet Julia Package - flexible and efficient deep learning in Julia
371 stars 70 forks source link

Dynamic dispatching in optimizers #44

Open Evizero opened 8 years ago

Evizero commented 8 years ago

I could be wrong in this, but looking at the code for sgd and adam it looks like there is dynamic function dispatch going on because the types for optimizer options are not known at compile time.

For example for sgd in update:

lr = get_learning_rate(self.opts.lr_scheduler, self.state)

If I understand Julia correctly, then lr_scheduler is a boxed value in an Any block and thus the appropriate get_learning_rate variant needs to be looked up at runtime, which can have performance implications.

What do you think?

pluskid commented 8 years ago

Yes, at the best, I could make it AbstractLearningRateScheduler, which is better than Any but still need dynamic dispatching, because we do not know at compile time what the user will use for the learning rate scheduling. I use Any here mainly for convenience of being able to use the default nothing value.

Evizero commented 8 years ago

I haven't investigated it other than looking at the code in Github. It may be that empirically the influence is negligible. I think I'll have some time during December to investigate this. If it is a bottleneck then we could probably avoid it by using Typeparameter after adapting @defstruct to support it.

pluskid commented 8 years ago

@Evizero You are very welcome to run a profiling and locate the bottleneck in the computation! :) My guess is that the optimizer is not a bottleneck, and even if it becomes so, the bottleneck might be computing momentum matrices, etc. instead of getting learning rate. But I might be wrong.