Simply calling the functions in another function will not necessarily make the symbols visible from outside the translation unit, since the compiler could for example perform inlining and never emit external symbols for those template instantiations.
Explicit instantiation of templates solves exactly that problem.
As a fly-by fix, this commit also removes the declaration of some function templates that were never defined.
I ran the Cifar-10 and the MovieLens examples and there does not seem to be a performance regression.
Before my changes
Training Movielens
real 0m6.850s
user 0m4.468s
sys 0m2.300s
Predicting Movielens
real 0m58.903s
user 0m54.644s
sys 0m4.064s
Training Cifar-10
real 1m14.164s
user 1m9.888s
sys 0m4.240s
After my changes
Training Movielens
real 0m6.890s
user 0m4.348s
sys 0m2.484s
Predicting Movielens
real 0m59.091s
user 0m54.996s
sys 0m3.884s
Training Cifar-10
real 1m14.239s
user 1m9.740s
sys 0m4.468s
The exact script I used to run those benchmarks follows, for reference:
Simply calling the functions in another function will not necessarily make the symbols visible from outside the translation unit, since the compiler could for example perform inlining and never emit external symbols for those template instantiations.
Explicit instantiation of templates solves exactly that problem.
As a fly-by fix, this commit also removes the declaration of some function templates that were never defined.
I ran the Cifar-10 and the MovieLens examples and there does not seem to be a performance regression.
Before my changes
After my changes
The exact script I used to run those benchmarks follows, for reference: