Closed fwyzard closed 1 year ago
Here are some attempts at playing with templates.
I'm using g++ 11.4 and clang++ 18 (trunk).
test.cc
#include <iostream>
#include <string>
#include "template.h"
int main(void) {
std::cout << get_value<int>() << std::endl;
std::cout << get_value<int>() << std::endl;
std::cout << get_value<int>() << std::endl;
std::cout << std::endl;
std::cout << get_value<unsigned short>() << std::endl;
std::cout << get_value<unsigned short>() << std::endl;
return 0;
}
template.h
#include <atomic>
template <typename T>
T get_value() {
static std::atomic<T> value = 0;
return value++;
}
This works, of course.
extern
templatesThis is the approach with explicit declaration and instantiation of templates as I understood it.
template.h
template <typename T>
T get_value();
extern template
int get_value<int>();
extern template
unsigned short get_value<unsigned short>();
template.cc
#include <atomic>
#include "template.h"
template <typename T>
T get_value() {
static std::atomic<T> value = 0;
return value++;
}
template
int get_value<int>();
template
unsigned short get_value<unsigned short>();
This works, resulting in
$ nm -A -C *.o | grep get_value
template.cc.o: 0000000000000000 W int get_value<int>()
template.cc.o: 0000000000000000 W unsigned short get_value<unsigned short>()
template.cc.o: 0000000000000000 u get_value<int>()::value
template.cc.o: 0000000000000000 u get_value<unsigned short>()::value
test.cc.o: U int get_value<int>()
test.cc.o: U unsigned short get_value<unsigned short>()
Removing the extern
declarations also works:
template.h
template <typename T>
T get_value();
template.cc
#include <atomic>
#include "template.h"
template <typename T>
T get_value() {
static std::atomic<T> value = 0;
return value++;
}
template
int get_value<int>();
template
unsigned short get_value<unsigned short>();
and results in the same symbols:
$ nm -A -C *.o | grep get_value
template.cc.o: 0000000000000000 W int get_value<int>()
template.cc.o: 0000000000000000 W unsigned short get_value<unsigned short>()
template.cc.o: 0000000000000000 u get_value<int>()::value
template.cc.o: 0000000000000000 u get_value<unsigned short>()::value
test.cc.o: U int get_value<int>()
test.cc.o: U unsigned short get_value<unsigned short>()
I've tried using only one instantiation, to see if that would fail at compile or link time:
template.h
template <typename T>
T get_value();
extern template
int get_value<int>();
template.cc
#include <atomic>
#include "template.h"
template <typename T>
T get_value() {
static std::atomic<T> value = 0;
return value++;
}
template
int get_value<int>();
This compiles, and fails at link time:
g++ -std=c++17 -O3 -g -Wall -fPIC -march=native -mtune=native -c -MMD template.cc -o template.cc.o
g++ -std=c++17 -O3 -g -Wall -fPIC -march=native -mtune=native -c -MMD test.cc -o test.cc.o
g++ -std=c++17 -O3 -g -Wall -fPIC -march=native -mtune=native template.cc.o test.cc.o -o test
/usr/bin/ld: test.cc.o: in function `main':
/home/fwyzard/test/extern_template/v2_bad/test.cc:13: undefined reference to `unsigned short get_value<unsigned short>()'
/usr/bin/ld: /home/fwyzard/test/extern_template/v2_bad/test.cc:14: undefined reference to `unsigned short get_value<unsigned short>()'
collect2: error: ld returned 1 exit status
Finally, the approach from this PR:
template.h
#ifndef template_h
#define template_h
template <typename T>
T get_value();
template.cc
#include <atomic>
#include "template.h"
template <>
int get_value<int>() {
static std::atomic<int> value = 0;
return value++;
}
template <>
unsigned short get_value() {
static std::atomic<unsigned short> value = 0;
return value++;
}
This works and produces slightly different symbols:
$ nm -A -C *.o | grep get_value
template.cc.o: 0000000000000000 T int get_value<int>()
template.cc.o: 0000000000000020 T unsigned short get_value<unsigned short>()
template.cc.o: 0000000000000004 b get_value<int>()::value
template.cc.o: 0000000000000000 b get_value<unsigned short>()::value
test.cc.o: U int get_value<int>()
test.cc.o: U unsigned short get_value<unsigned short>()
Now the template specialisations are marked as T
(code) instead of W
(weak), and the function static variables are marked as b
(BSS local) instead of u
(unique global symbol).
If we use the explicit template definitions (with or without the extern
declarations) it's like we are using inline
functions: the linker has to resolve the duplications at link time, and GCC uses the special u
symbol to instruct the dynamic linker to keep a single instance of the static variables (clang++
marks them as V
, for explicit weak objects).
If we use the explicit template specialisation it's like we are using normal functions: they should be defined in a single translation unit, and the static variables are marked as local.
Keeping in mind also the potential use cases in CMSSW, What do we prefer for libraries ?
Here are the results of some follow up checks.
When loading shared libraries with RTLD_LAZY | RTLD_GLOBAL
(as we do in CMSSW) all approaches work correctly.
When loading shared libraries with RTLD_LAZY | RTLD_LOCAL
:
extern
templates", and "Removing the declarations" work as expected;static
variable.So, right now my preference would be to use the "Removing the declarations" approach.
After spending some time with Chris on trying to understand various aspects of the topic we agree with the "Removing the declarations" approach.
Thanks, I've updated the platform()
and devices()
part accordingly.
I think I found a better solution for the initialise()
functions.
Looks good to me (I also like the new way to handle initialise()
)
Move the platform and device objects for the host and the accelerators to function static variables, initialised by the first call to their functions.
Change
to
The underlying
platform()
anddevices()
are still templated on thePlatform
, so they are initialised only once even if there are multiple back-ends that share the same platform (e.g. serial and tbb).Split
alpakaDevices.h
intoalpaka/devices.h
/alpaka/devices.cc
andhost.h
/host.cc
.Split
alpakaConfig.h
intoconfig.h
andcommon.h
. The latter includes only the definitions that are independent from any back-end, in thealpaka_common
namespace.Finally, rename
alpakaWorkDiv.h
toworkdivision.h
,alpakaMemory.h
tomemory.h
.