edouarda / brigand

Instant compile time C++ 11 metaprogramming library
Boost Software License 1.0
572 stars 48 forks source link

Rework lambda mechanism to improve speed and usability #177

Closed odinthenerd closed 8 years ago

odinthenerd commented 8 years ago

moved here to stop spamming twitter

Edit:

Relevant tweet: https://twitter.com/odinthenerd/status/740900401340354560

odinthenerd commented 8 years ago

Found another tweak to 'new brigand' model, now seems to be neck and neck again.

What is faster seems to be super benchmark dependent. Benchmarking with 1000 transforms processing (adding pointer to) only one unique element each they are super similar. However when processing 1000 lists of 4 unique elements each the results are interesting:

template<template<typename...> class F>
struct magic{};

template<typename T>
struct classicFunctor {
    using type = T*;
};

template<typename T>
using magicFunctor = T*;

struct metaFunctor {
    template<typename T>
    using invoke = T*;
}; 

meta::transform < T, metaFunctor >;     //  220 
meta::transform < T, meta::quote<classicFunctor> >;     //  660  
brigand::transform<T, classicFunctor<_1>>;    //180 
brigand::transform<T, magic<magicFunctor>>;      // 140 

"magic" is an alias wrapper (inspired by metal) which we will probably name quote (or turbocharge?) or something and can coexist with the "new brigand" strategy. I think that this shows that at least on my machine with clang 3.7 that meta is beatable with a unified lambda syntax.

odinthenerd commented 8 years ago

We could name "magic" "eager" and also allow "eager" to wrap the lambda in the cascading lambda use case if args match as in:

template<typename T>
using tr = transform<T, lazy::transform<_1, eager<std::add_pointer_t>>>;
ericniebler commented 8 years ago

I know meta::quote is implemented in terms of defer because of a core issue. There is a comment in the code. defer makes quote instantiate more templates than it would otherwise need to. The core issue only surfaces in certain situations. (I could quickly find the problem again by removing the workaround and seeing what breaks.) I'm curious whether brigand suffers from the same problem and will need a similar workaround. I would check but I'm not at my computer.

ericniebler commented 8 years ago

"core issue" == "core language issue"

odinthenerd commented 8 years ago

yes I was expecting that was the reason for 'meta::transform < T, meta::quote >' being so slow. I also expect that we will find bugs or corner cases in my implementation so no cracking champagne yet ;) I am coming around to the idea of an all eager interface as suggested by @brunocodutra although It does fuck with your mind at first glance. If we were to rename 'eager' again to 'call' and give it a signature like meta::defer (as in template<template<typename...> class F, typename...Args>) it would result in something like this which doesn't look half bad:

template<typename T>
using tr = transform<T, call<transform, _1, call<std::add_pointer_t, _1>>>;
brunocodutra commented 8 years ago

The core issue shows up specificaly when instantiating an alias template with variadic arguments in place of non-variadic parameters outside of a SFINAE context. Metal works around it by avoiding instantiating things within nested alias templates, hence the pattern matching to extract the metafunction. Otherwise a double instantiation within a SFINAE context is required.

odinthenerd commented 8 years ago

Can't seem to get my head around transforming something like this:

template<typename T>
using tr = transform<T, is_same<remove_reference<_1>, conditional<is_same<int*,add_pointer<_1>>,bool,add_pointer<_1>>>>; 

into only eager... maybe keep hybrid?

[core language issue] I think 'new brigand' is not affected because I only do pattern matching specialization, no nested stuff (its slow any way).

brunocodutra commented 8 years ago

In Metal I'd make it lazy by replacing /metafunction</ by /bind<lambda, /, which can be made terser by the alias templates I suggested before.

Brigand's bind doesnt appear to allow this as easily, because the MetafunctionClass being bound isn't implicit protected, so I guess it would require some tweaking.

brunocodutra commented 8 years ago

I mean, I would make the eager equivalent lazy by following the rule above.

odinthenerd commented 8 years ago

if we had wrappers 'call' and 'lcall' (lazy call) which also implicitly protect the user could decide on a lambda by lambda basis which syntax form they want and we could get rid of the 'protect', 'magic', '_l' and whatnot I have been using. That way we could still pattern match aliases because they would be wrapped in a 'call' at top level and have the speed that comes with it (equivalent to 'magic' in the last benchmark). We could assume 'lcall' at top level to make things terser (assuming 'call' is not possible) Following this idea @ericniebler Cartesian product would look like this:

template <class L>
using cart_prod = reverse_fold<
    L, list<list<>>,
    l::join<l::transform<
            _2, lcall<l::join<l::transform<parent<_1>, lcall<list<l::push_front<_1, parent<_1>>>>>>>>>>>;

add_pointer would look like this:

template<typename T>
using tr = transform<T, call<std::add_pointer_t>>; //probably fastest possible implementation
//or
template<typename T>
using tr = transform<T, std::add_pointer<_1>>; //slightly slower

we would still need to tag things that are not lambdas but we could call that 'pin' rather than '_p':

remove_if<T,is_same<pin<int>,_1>>

parameters that are not lambdas actually don't happen as often as one might think looking at my kvasir.io code base.

ericniebler commented 8 years ago

The core issue shows up specificaly when instantiating an alias template with variadic arguments in place of non-variadic parameters outside of a SFINAE context. Metal works around it by avoiding instantiating things within nested alias templates, hence the pattern matching to extract the metafunction. Otherwise a double instantiation within a SFINAE context is required.

Interesting! Meta's choice of a nested invoke alias would seem to necessitate this expensive defer workaround that wouldn't otherwise be necessary. That's something to chew on.

For the record, I'm aware that meta's lambda mechanism is heavyweight. It's there as a curiosity, not because I think it's useful or necessary. I didn't believe an MPL-like lambda syntax could be provided without an MPL-like perf hit. If that's not true, it changes things.

I'm not wild about the need for pin. What about the design necessitates it? Can't the lambda evaluation mechanism simply interpret any type it can't deconstruct via pattern matching as a literal type and substitute it directly? Then pin is only needed as an optimization to keep the lambda mechanism from recursing into a particular template instantiation.

odinthenerd commented 8 years ago

yeah pin seems to be the thing people don't like.

Its not just about stopping recursion (as I originally falsely claimed), if you evaluate everything that is not a pin, a call or an lcall as a lazy function then everything is just pattern matching, no SFINAE tests or anything needed. I am not aware of a non SFINAE test which shows whether a type has a nested type alias (i.e. is a lazy function). Besides the user may want to pass a function as a type so we need something like pin any way. You could alternatively wrap every function but that is bloating the common case rather than the uncommon case. I have not tested the speed of using a SFINAE test but it seems to also be a teachability problem if the use of pin relies on whether a type you want to pass as a type happens to have a nested type alias.

odinthenerd commented 8 years ago

It is trivial to make everything that does not mach template<template<typename...> class F be interpreted as a type, that seems hard to teach though.

brunocodutra commented 8 years ago

That's what Metal does and TBH I don't see how this would be harder to teach than by evoking the presence of some nested template.

odinthenerd commented 8 years ago
transform<T, std::is_same<_1,std::array<int,4>> //works
transform<T, std::is_same<_1,std::false_type>> //works
transform<T, std::is_same<_1,std::vector<int>> //error
transform<T, std::is_same<_1,std::string>> //error

this does not seem that intuitive.

odinthenerd commented 8 years ago

coming back to @pdimov 's world view, adding a wrapper around template template arguments (which I called call above) is less of a perf hit as long as you extract it using pattern matching. It is still faster on clang (your mileage may vary) than a nested alias.

meta with custom functor with pure alias 220ms new brigand 180 ms new brigand call wrapper pure alias 140ms dimov impure alias 140ms dimov pure alias 90ms

remember "pure aliases" (where you never need to retrieve a nested type) are the uncommon case so its really 140 vs. 180. That penalty buys you things like overloading (transform would have to take the function first in @pdimov 's world our not be capable of two lists), composition and binding (without creating aliases yourself) and 'generic' algorithms (no non types in public interface).

Having algorithms take non type parameters just seems wrong to me, but who knows maybe I'm just being dogmatic. What would Cartesian product look like in @pdimov style?

odinthenerd commented 8 years ago

Looking through ranges v3 here you would need to use pin

meta::apply<
                    meta::quote<meta::lazy::strict_and>,
                    meta::transform<
                        base_concepts_of_t<Concept>,
meta::bind_back<meta::quote<concepts::models>, Ts...>>>;

but I still think

wrap<
    transform<
        base_concepts_of_t<Concept>,
        call<concepts::models,_1,pin<Ts>...>>,
    strict_and>;

is sexier. update: oops this would probably also be slower, let me have a closer look at what ranges is doing exactly here. update2: bind_back equivalent is not hard to optimize for one and two parameters, _3 or more are never really used (ranges only uses them in unit tests). bind_front equivalent as in call<F,pin<Ts>...,_1> is harder to make fast if there are a lot of Ts. In ranges there is never more than one Ts but I could picture this happening out in the wild.

brunocodutra commented 8 years ago

@porkybrain I think there was a typo in a previous message of yours, which led to some miscommunication. You mentioned any type that does not match template<template<typename...> class F is considered a regular type (i.e. not a lambda). That is clearly a typo, and I supposed you meant template<template<typename...> class> class F (notice the trailing > class), but now I realize you meant template<typename...> class F instead (notice the missing leading template<).

To clarify matters, Metal only considers types that match template<template<typename...> class> class F to be considered callables. That's what I claim to be intuitive and easy to teach.

Update: I agree very much that limiting lambdas to types that match template<typename...> class F is very counterintuitive because it overlaps with the definition of lists. That's why Metal moved away from the lazy design to begin with and also why I am strongly against the hybrid design.

brunocodutra commented 8 years ago

I see that I might not be very clear when I claim lazy semantics may be expressed using an all-eager implementation, so let me clarify the design:

Let f and gs... be any eager metafunctions, while F = lambda<f> and Gs = lambda<gs>... are types that match template<template<typename...> class> class.

Let invoke<F, args...> := f<args...> and invoke<Gs, args...>... := gs<args...>....

Now assume bind is implemented so that invoke<bind<F, Gs...>, args...> := invoke<F, invoke<Gs, args...>...>, which is thus equivalent to f<gs<args...>...>.

Finally assume lazy versions of algorithms are defined as follows

namespace lazy {
    template<typename L>
    using reverse = bind<lambda<::reverse>, L>;

    template<typename X, typename Y>
    using equal = bind<lambda<::equal>, X, Y>;

   /* ... */
}

Using lazy aliases defined as such, is_palindrome could be expressed exactly as in MPL, without the need for actual lazy metafunctions.

using is_palindrome = lazy::equal<_1, lazy::reverse<_1>>;

static_assert(invoke<is_palindrome, list<int, void, void, int>>{}, "");

That is exactly how Metal is designed.

odinthenerd commented 8 years ago

Sounds like I need to have a look at your design more closely. I came down with a cold today so my IQ dropped by like half since this morning so I'm not making any promises ;)

edouarda commented 8 years ago

@porkybrain perhaps you are overthinking this. It's probably better to iterate on designs than to try to come up with a "perfect" solution.

odinthenerd commented 8 years ago

Hehe I don't want to be the guy that broke brigand twice :-)

sometimes its good to get down to first principals. I'm not sure I'm there yet.

pdimov commented 8 years ago

What would Cartesian product look like in @pdimov style?

https://github.com/pdimov/mp11/blob/master/include/boost/mp11/algorithm.hpp#L205 https://github.com/pdimov/mp11/blob/master/test/mp_product.cpp

odinthenerd commented 8 years ago

Ok so basically mp11 product is hand coded for speed... I guess it all comes down to our expected user and use case. We will never be as fast with lambdas as the hand coded equivalent so it all comes back to the speed vs. Readability/writability/teachability trade off.

A type wrapped template, which call essentially is, seems to be the next fastest thing possible. We can add an overload accepting call without breaking brigand. Seems to be a good first step in order to give trivial stuff, which is the common case, more speed.

odinthenerd commented 8 years ago

this is getting hopelessly long, moving all 'call' related conversation to the pull request #185, and all lazy/cascading/variadic stuff to #186