boostorg / hana

Your standard library for metaprogramming
http://boostorg.github.io/hana
Boost Software License 1.0
1.69k stars 215 forks source link

Function objects should have a unique address #76

Closed ldionne closed 8 years ago

ldionne commented 9 years ago

When used from different translation units, function objects should resolve to the same address. In other words, we want boost::hana::drop to have the same address in all translation units. This is related to #75.

vtnerd commented 9 years ago

Eric Niebler discussed the same thing in this paper. I think the technique should work for Hana too.

ldionne commented 9 years ago

Thanks a lot for the pointer. I was aware of his blog post on the matter, but not this extended explanation.

ldionne commented 9 years ago

I am in the process of fixing this bug. Here's my current solution, which does not quite work:

Normally, I would have

struct _drop { };
constexpr _drop drop{};

Following @ericniebler's article, my fix goes like:

template <typename T>
constexpr T static_constexpr{};

struct _drop { };
namespace { constexpr auto const& drop = static_constexpr<_drop>; }

However, the addresses of drop still appear to be different when fetched from different translation units, which is incorrect. Any clue what is going on? More precisely, here's a minimal working example showing my issue:

In a.cpp

#include <iostream>

template <typename T>
constexpr T static_constexpr{};

struct _drop { };
namespace { constexpr auto const& drop = static_constexpr<_drop>; }

void show_pointers(void const* p) {
    std::cout << p << std::endl;
    std::cout << &drop << std::endl;
}

In b.cpp

template <typename T>
constexpr T static_constexpr{};

struct _drop { };
namespace { constexpr auto const& drop = static_constexpr<_drop>; }

void show_pointers(void const* p);

int main() {
    show_pointers(&drop);
}

And the output is

0x107344f22
0x107344f23
pfultz2 commented 9 years ago

I am implementing this currently in my Fit library(and it works), but I believe you need to declare a static member. For function objects, you should be able to do this:

template<class T>
struct static_const
{
    static constexpr T value {};
};

template<class T>
constexpr T static_const<T>::value;

struct _drop { };
static constexpr const auto& drop = static_const<_drop>::value;

You can see the source here for FIT_STATIC_FUNCTION(I don't know how easy it is to follow, since I do extra things to ensure that lambdas have unique addresses across TUs as well).

You could also look at how Eric Niebler's range library implements this(he has a static_const class for declaring function objects).

ldionne commented 9 years ago

I can confirm that your suggestion works, but I find this curious. This means that variable templates resolve to different addresses, yet nested static members in a template do resolve to the same address.

ldionne commented 9 years ago

@pfultz2 Do you actually think it is worth going through these hoops to ensure the uniqueness of address for function objects? I'm thinking more and more that this is superfluous and makes the code more complicated for 0 gain.

vtnerd commented 9 years ago

From the N3797 draft:

3.2.4 Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). An inline function shall be defined in every translation unit in which it is odr-used. ... 3.2.6 There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements.

N4296 has the same language. So a ODR-used variable-template should be a violation of the ODR if it appears in multiple TU. This [EDIT: referring to Paul Fultz suggestion] is not a violation of the ODR, because 3.2.6 provides an exception for a static data member of a class template.

Or at least thats my interpretation of this fun corner case of the language. Maybe Eric has more information since he wrote a lot more on the subject.

ldionne commented 9 years ago

Right, so no ODR violation for variable templates, but they still get different addresses. Is this as bad as I make it out to be?

ericniebler commented 9 years ago

That's pretty bad IMO. Probably worth following up with the committee, or at least shooting James Widman an email.

\e

On Thu, May 21, 2015, 11:11 AM Louis Dionne notifications@github.com wrote:

Right, so no ODR violation for variable templates, but they still get different addresses. Is this as bad as I make it out to be?

— Reply to this email directly or view it on GitHub https://github.com/ldionne/hana/issues/76#issuecomment-104375305.

ericniebler commented 9 years ago

BTW, the analysis of the ODR in the draft of my paper is wrong. Please see the official version of N4381 here. (Never take anything you read in a DXXXX paper as gospel truth. Those are drafts. Always look for the NXXXX paper.)

ldionne commented 9 years ago

Alright, thanks!

vtnerd commented 9 years ago

My understanding is that is an ODR violation (thus the multiple addresses for the same object), and constexpr implies internal linkage. The compiler/linker provides no error, as per 3.2.4, since the internal linkage indicates that the objects are different. Joel de Guzman came to the same conclusion on a similar issue in X3. But as Eric just said, maybe someone with better knowledge on the subject should be asked. FWIW, using the address of a hana function does seem like a rare case.

ldionne commented 9 years ago

Actually, it seems like it should never happen. Hence, I am left wondering whether it is at all useful to try and provide unique addresses. As you can see in b5860b7, it bloats the code quite a bit.

ldionne commented 9 years ago

Crap: a variable is ODR-used when it is passed as a reference to a function. Hence, if you take Hana function object by reference in two different translation units and then link them together, this is an ODR violation IIUC.

pfultz2 commented 9 years ago

@pfultz2 Do you actually think it is worth going through these hoops to ensure the uniqueness of address for function objects?

For a general-purpose library, yes. If the compiler doesn't inline the function object, it can bloat the final executable with multiple copies of the same function object(because there is multiple addresses).

Crap: a variable is ODR-used when it is passed as a reference to a function. Hence, if you take Hana function object by reference in two different translation units and then link them together, this is an ODR violation IIUC.

Oh, that is probably another reason.

pfultz2 commented 9 years ago

@ericniebler In the paper it says:

The anonymous namespace is needed to keep the std::begin reference itself from being multiply defined.

Wouldn't static work as well? It seems to work on gcc and clang using static, ie:

static constexpr auto const& begin = __static_const<__detail::__begin_fn>;
ldionne commented 9 years ago

For a general-purpose library, yes. If the compiler doesn't inline the function object, it can bloat the final executable with multiple copies of the same function object(because there is multiple addresses).

I don't expect anybody to actually store the addresses of those function objects. Furthermore, since those function objects are empty, I would expect the compiler to optimize away argument passing via reference (which would normally require the address of the function object). Hence, I don't think code bloat is an issue.

pfultz2 commented 9 years ago

I don't expect anybody to actually store the addresses of those function objects.

It doesn't matter. The compiler could still choose not to inline the function even if you don't take the address(for example if you use -fno-inline or the user reached some internal limit with inlining).

Furthermore, since those function objects are empty

Empty != 0. Each object will take at least one byte or more in the final executable depending on binary formats.

Eric's solution is pretty simple and easy to apply everywhere(at least for function objects) to avoid these issues.

ldionne commented 9 years ago

Ok, I agree this needs to be fixed, but Eric's fix is nowhere near "easy to apply everywhere" for Hana, which uses variable templates and decltype in conjunction (I don't want decltype(type<T>) to be a reference, for example): https://travis-ci.org/ldionne/hana/jobs/63536584#L1484

Also, Clang 3.5 seems to have issues: https://travis-ci.org/ldionne/hana/jobs/63536579#L704

pfultz2 commented 9 years ago

I don't want decltype(type) to be a reference

That is a good point. It seems like it would be easier to make type<T> a class instead of a variable template. Then again, if concepts lite goes the route of using variable templates, they may tweak the language to make this problem go away, anyway.

ldionne commented 9 years ago

That is a good point. It seems like it would be easier to make type<T> a class instead of a variable template.

Yes, it would solve a couple of problems, but we would have to write type<T>{} all around the place instead of type<T>, which is not only uglier but also defeats the "philosophy" of Hana which is "types as values". We would then have the same problem for integral_constant, metafunction and all the other related constructs. I guess this idea is worth considering, but I'm a bit reluctant to bastardize Hana's interface for a corner case issue like this.

By the way, my (naive) point of view on the whole issue is that global objects defined in header files should simply be collapsed into a single object by the linker, without creating a ODR violation. I know the language won't change since it might break existing code, but to me it seems like a flaw in the language rather than in the design of the library.

viboes commented 9 years ago

You must design a library following the rules of the languages, otherwise the library is bad designed. If the language is bad designed, it should be changed.

pfultz2 commented 9 years ago

By the way, my (naive) point of view on the whole issue is that global objects defined in header files should simply be collapsed into a single object by the linker

The linker will do this for template functions and classes. It doesn't do it for global variables, though.

I know the language won't change since it might break existing code

Well, I don't know if that is true. It would be nice if the language at least made global constexpr objects unique and ODR-friendly, but then again there is this.

JamesWidman commented 9 years ago

There is an open issue related to this for the core language; for future reference it is:

"2104. Internal-linkage constexpr references and ODR requirements"

... which, in the next week-ish, should appear in the new version of the issues list here:

http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_active.html

I think there is consensus (in the core working group) that an implementation should be able to look through the reference to the object in a case like this; a note about that should appear when the issues list is updated.

I'm less certain about the expected object identity for variable template instantiation in different translation units; I'll get back to you on that.

ldionne commented 9 years ago

@JamesWidman Thanks for the additional info.

MarisaKirisame commented 9 years ago

I have this problem too. Are there any workaround?

ldionne commented 9 years ago

You can use the trick explained in @pfultz2 's comment above.

MarisaKirisame commented 9 years ago

@ldionne Thx. I just merge all of my cpp file... Does not bring any issue except compile time.