alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
356 stars 74 forks source link

Compilation fails in Debug mode #754

Closed kloppstock closed 4 years ago

kloppstock commented 5 years ago

Compilation fails in Debug mode if variables needed to create a workdiv are stored as static constexpr inside a struct. Following code for example fails to compile in Debug mode (for example if compiled with g++ -g main.cpp):

// main.cpp
#include "test.hpp"

int main() {
  auto wd = getWorkDiv<CpuSerial>();

  return 0;
}
// test.hpp
#pragma once

#include <alpaka/alpaka.hpp>

// Defines for dimensions and types.
using Dim = alpaka::dim::DimInt<1u>;
using Size = uint64_t;
using WorkDiv = alpaka::workdiv::WorkDivMembers<Dim, Size>;

template <typename TAccelerator> WorkDiv getWorkDiv() {
  return WorkDiv(TAccelerator::blocksPerGrid, TAccelerator::threadsPerBlock,
                 TAccelerator::elementsPerThread);
}

struct CpuSerial {
  static constexpr Size elementsPerThread = 1u;
  static constexpr Size threadsPerBlock = 1u;
  static constexpr Size blocksPerGrid = 1024u * 512u;
};

The resulting error message looks like this:

/tmp/cc0i3fZ2.o: In function `alpaka::workdiv::WorkDivMembers<std::integral_constant<unsigned long, 1ul>, unsigned long> getWorkDiv<CpuSerial>()':
test.hpp:12: undefined reference to `CpuSerial::elementsPerThread'
test.hpp:12: undefined reference to `CpuSerial::threadsPerBlock'
test.hpp:12: undefined reference to `CpuSerial::blocksPerGrid'

When adding the following Code to the end of the test.hpp file or the main.cpp file, it compiles:

constexpr Size CpuSerial::elementsPerThread;
constexpr Size CpuSerial::threadsPerBlock;
constexpr Size CpuSerial::blocksPerGrid;

If the workdiv is replaced with a simple class, it also compiles:

#pragma once

#include <cstdlib>

using Size = size_t;
class WorkDiv {
private:
  Size x, y, z;

public:
  WorkDiv(Size x, Size y, Size z) : x(x), y(y), z(z) {}
};

template <typename TAccelerator> WorkDiv getWorkDiv() {
  return WorkDiv(TAccelerator::blocksPerGrid, TAccelerator::threadsPerBlock,
                 TAccelerator::elementsPerThread);
}

struct CpuSerial {
  static constexpr Size elementsPerThread = 1u;
  static constexpr Size threadsPerBlock = 1u;
  static constexpr Size blocksPerGrid = 1024u * 512u;
};

When compiling the code in Release mode with g++ -O3 main.cpp it always works. The following cppreference entry might be related: https://en.cppreference.com/w/cpp/language/definition

sbastrakov commented 5 years ago

In your example the code rightfully fails to compile (guess in Release smth is optimized away and so passes by accident) as the alpaka::workdiv::WorkDivMembers constructor takes parameters by reference and thus it is an odr-usage. As far as I can see, there is no reason it can't be done by value, which would solve the issue for your example. However, alpaka itself seem to not use the code in that fashion, so maybe it is not supposed to be used like that? (I have no idea, just a guess)

BenjaminW3 commented 5 years ago

@sbastrakov is right. When not being used within another constexpr context, the static constexpr is in no way different from a static const variable which has to be defined somewhere in the binary exactly once. I think that the compiler is not right in allowing the code to compile in release mode.

sbastrakov commented 5 years ago

@BenjaminW3 I think it is actually not as strict as you describe: violation is taking address of such a static constexpr variable, including passing by reference to a function. Think in our example, in Release compiler just inlines stuff and nothing is passed by reference as the result. In case we pass by value it should be fine.

BenjaminW3 commented 5 years ago

@kloppstock Please try if taking the parameters by value fixes the problem.

ax3l commented 5 years ago

Maybe in parallel we could think if we might want to make those arguments just passed by const value in the constructor. We could rely on copy elision here to init the members: https://abseil.io/tips/117 This makes usage easier since it works with l- and r-values and even has zero-copy for r-values. I think there is no need to enforce l-values here as we currently do, if I am not missing something. Update: oh, that's what you meant as well.

kloppstock commented 5 years ago

@BenjaminW3 Do you mean like in the last example of my initial post?

sbastrakov commented 5 years ago

I think @BenjaminW3 meant to change here . Hopefully this does not break any other code (to me looks like it should be fine).

kloppstock commented 5 years ago

Thanks, that makes sense. This part of the code works now but we use the CpuSerial::blocksPerGrid in a alpaka::kernel::createTaskKernel() call which also takes the arguments per reference.

ax3l commented 5 years ago

Yep, just like constructors, factories should not use constant reference arguments but either && and/or just by value with copy elision. (For example: std::make_shared) Probably createTaskKernel should be changed as well.

Some of these constructs in many of our codes originate from our recent C++98 days, where both copy elision and universal references where not yet introduced into the language. Any PRs that clean up the constructors and factories of alpaka, cupla, PMacc and PIConGPU with that regard are highly appreciated!

There is even a clang-tidy "modernize" check & transformer for this, which can help us to automatically update code: https://clang.llvm.org/extra/clang-tidy/checks/modernize-pass-by-value.html

TheFl0w commented 5 years ago

Yep, just like constructors, factories should not use constant reference arguments but either && and/or just by value with copy elision.

Sounds good. If construtors are templated anyway, forwarding references are a good choice. For non-templated constructors pass by value + copy elision might be a possibility, but I don't know if it always works. In our specific use-case alpaka::workdiv::WorkDivMembers has a templated constructor, but using forwarding references would probably result in the same error when we pass a static constexpr struct member since we are dealing with a named value which is accepted as a const lvalue ref either way. So only pass by value would fix our problem. I'd say this is a "language defect" that gets fixed in C++17 where static constexpr implies inline and the problem just goes away. I don't know if it's worth updating alpaka for this specific use-case.

ax3l commented 5 years ago

Yep, let's try this already. Actually those are not the most expensive data structures that are passed here and C++11 has copy elision in the language which should work reliable. Not sure if inline is really an issue here for those parameters, but you might thought this more through already. Either way, the most sensitive platform (e.g. register usage) is CUDA on which nvcc is even for older language versions aggressively pre-optimizing what it gets from const and constexprs. For CPU platforms, modern compilers with C++17 are already available which will just do the described optimization anyway when Alpaka is compiled with them (*).

(*) our CMake logic should be updated from CXX_STANDARD/-std=XY (exact version) to cxx_11 (min-version) target features, but we are currently stuck on clang-cuda support with cmake, so just switch this manually when newer standards shall be used for testing.

TheFl0w commented 5 years ago

Here is another minimal example of our specific problem. As was mentioned before, static members of a struct must be defined in a compilation unit so we can take the address of it. The compiler should be okay with us not providing that definition because that definition might be in a different implementation file. What we get instead is a linker error, because we didn't generate a symbol for that missing object. With optimizations enabled, the compiler will inline the function calls and not pass by reference so the missing symbol is never required by the linker in the first place.

struct X {
    static constexpr int value = 10;
};

void foo(const int&) {}

int main() {
    foo(X::value);
    return 0;
}

This code snippet will cause a linker error in debug builds using C++14 or older and it will work perfectly fine for all build types using C++17. The C++17 standard introduces the inline keyword for variables, allowing multiple definitions just like inline functions. It also implicitly treats static constexpr variables as inline, solving the problem without having to make any changes to the existing code.

ax3l commented 5 years ago

But Y U not pass as temporary:

    foo(int(X::value));

or else truly define it:

struct X {
    static constexpr int value = 10;
};

constexpr int X::value;

void foo(const int&) {}

int main() {
    foo(X::value);
    return 0;
}

like all C++11/14 programmers? ;-) (See first four answers in this SO).

Of course, you can also put your definition elsewhere and link it.

Yes, as proposed above, accepting by value will solve this already for C++11 & C++14. Do you like to submit a PR? :)

sbastrakov commented 4 years ago

I believe this one is solved, closing.