Open Evizero opened 8 years ago
Yes, at the best, I could make it AbstractLearningRateScheduler
, which is better than Any
but still need dynamic dispatching, because we do not know at compile time what the user will use for the learning rate scheduling. I use Any
here mainly for convenience of being able to use the default nothing
value.
I haven't investigated it other than looking at the code in Github. It may be that empirically the influence is negligible. I think I'll have some time during December to investigate this. If it is a bottleneck then we could probably avoid it by using Typeparameter after adapting @defstruct
to support it.
@Evizero You are very welcome to run a profiling and locate the bottleneck in the computation! :) My guess is that the optimizer is not a bottleneck, and even if it becomes so, the bottleneck might be computing momentum matrices, etc. instead of getting learning rate. But I might be wrong.
I could be wrong in this, but looking at the code for sgd and adam it looks like there is dynamic function dispatch going on because the types for optimizer options are not known at compile time.
For example for sgd in
update
:If I understand Julia correctly, then
lr_scheduler
is a boxed value in anAny
block and thus the appropriateget_learning_rate
variant needs to be looked up at runtime, which can have performance implications.What do you think?