Draft: Add Small Buffer-Optimization for lambda and functor created delegates (C11)

devjoa commented 3 months ago

Hi, I've changed the creation of lambda and functor delegates, where I use the area for the pointer to create a Small Buffer-Optimized area instead of a reference to these types. This will be a breaking change, and may also in the worst case be a silent change, as the compiler will not detect some cases, i.e. no compile error. However, IMHO I think this is a pretty good improvement that will make it a bit easier to use for lambdas, and add a behavior I've been missing for a long time.

This commit adds 'Small Buffer Optimization' to functor and lambda functions with small footprint, i.e. the same size as 'void *' or less, and will therefore mimic the std::function behavior for these types, instead of being a reference to them as before.

This means that the following operations are now allowed to delegate with deferred/out-of-scope calls:

etl::delegate<void()> Class::method() {
  etl::delegate<void()> d {[this](){ this->do_something(); }};
  return d;
}

A caveat is that the old behavior where everything became a reference instead of an object is changed and the user must now use a reference_wrapper to gain the same behavior, i.e. using etl::ref or etl::cref:

  auto f = [&](){ /* ... */ };
  etl::delegate<void()> d { etl::ref(d) };
  d();

This change of behavior also matches the C++ Core Guidelines for how to pass parameters [https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f15-prefer-simple-and-conventional-ways-of-passing-information]

Please review my MR, and I'm pleased to get feedback. I have not done any performance testing so it may add an extra penalty, but when I've elaborated with compiler explorer, it seems to be similar.

Also notice that I haven't changed the create functionality of delegates, as I don't know if it should be kept as is...?

semanticdiff-com[bot] commented 3 months ago

Review changes with SemanticDiff.

devjoa commented 3 months ago

Forgot to mention that I've also put a discussion on this here: https://github.com/ETLCPP/etl/discussions/860

jwellbelove commented 3 months ago

I think the main problem I have with this is that its usefulness is limited to functors and lambdas that fit in the space and alignment of a void*.

devjoa commented 3 months ago

Hi @jwellbelove, I understand your concern about its usefulness, also I am beginning to doubt if it is possible to retrofit this functionality because it's a breaking change and relies on STL functionality. However, there are two main reasons why I started to tinker with this change of delegates:

When using delegates as a callback, I mostly use its run-time member function construction. This interface is somewhat clumsy, but I believe it's impossible to make it easier to use, and that's why I would love to use [this](){this->func()} as callback construction.
When I and my colleagues started to use ETL, the delegate interface was one of the things we didn't get hold of at the beginning. Most of us came from a "plain-C" background and thought that the delegate was a lightweight implementation of std::function. However, we found out that lambdas were more or less useless as a callback that the lambda capture object must live even after the delegate construction. During the implementation of this function, I've found that even the test suite uses lambdas in an undefined behavior manner: https://github.com/ETLCPP/etl/blob/f1133d5bbcb48458d1c1b3eee06efeb5c2844e67/test/test_byte_stream.cpp#L1321-L1326

It is not my intention to rant, I think ETLCPP is one of the best things that has happened to embedded C++ and is a reason why I think C++ is possible to use in an embedded environment, but I want to make a point that it's easy to misuse delegates.

jwellbelove commented 3 months ago

I'm just wondering whether a lambda wrapped in a singleton could be a solution and use that as the parameter type instead of a raw functor/lambda. I'm not home this evening so I'll play around with ideas tomorrow.

drewr95 commented 3 months ago

Hi @jwellbelove, I understand your concern about its usefulness, also I am beginning to doubt if it is possible to retrofit this functionality because it's a breaking change and relies on STL functionality. However, there are two main reasons why I started to tinker with this change of delegates:
1. When using delegates as a callback, I mostly use its run-time member function construction. This interface is somewhat clumsy, but I believe it's impossible to make it easier to use, and that's why I would love to use `[this](){this->func()}` as callback construction.

2. When I and my colleagues started to use ETL, the delegate interface was one of the things we didn't get hold of at the beginning. Most of us came from a "plain-C" background and thought that the delegate was a lightweight implementation of std::function. However, we found out that lambdas were more or less useless as a callback that the lambda capture object must live even after the delegate construction. During the implementation of this function, I've found that even the test suite uses lambdas in an undefined behavior manner:
   https://github.com/ETLCPP/etl/blob/f1133d5bbcb48458d1c1b3eee06efeb5c2844e67/test/test_byte_stream.cpp#L1321-L1326
It is not my intention to rant, I think ETLCPP is one of the best things that has happened to embedded C++ and is a reason why I think C++ is possible to use in an embedded environment, but I want to make a point that it's easy to misuse delegates.

We mostly use delegates' compile time construction with objects. Getting rid of it being trivially destructible removes all our delegates from ROM. This would be a major hit to my team and project.

devjoa commented 3 months ago

Hi @drewr95,

We mostly use delegates' compile time construction with objects. Getting rid of it being trivially destructible removes all our delegates from ROM. This would be a major hit to my team and project.

I hope my PR doesn't break that behavior. I believe I've only changed the dynamic construction and not the static ones. But I may be wrong. Are you talking about lambda/functor that is compile-time constructed?

drewr95 commented 3 months ago

Hi @drewr95,

We mostly use delegates' compile time construction with objects. Getting rid of it being trivially destructible removes all our delegates from ROM. This would be a major hit to my team and project.

I hope my PR doesn't break that behavior. I believe I've only changed the dynamic construction and not the static ones. But I may be wrong. Are you talking about lambda/functor that is compile-time constructed?

Hey @devjoa I think your PR will break that behavior. Even though you did not mess with the static construction, you cannot put anything in ROM that is not trivially destructible. I just had this issue with etl::variant (see https://github.com/ETLCPP/etl/issues/859). When I commented earlier about you needing to call the ~T() since you use the placement new, implementing the destructor to call ~T() automatically makes the delegate class non-trivially destructible.

devjoa commented 3 months ago

Hey @devjoa I think your PR will break that behavior. Even though you did not mess with the static construction, you cannot put anything in ROM that is not trivially destructible. I just had this issue with etl::variant (see #859). When I commented earlier about you needing to call the ~T() since you use the placement new, implementing the destructor to call ~T() automatically makes the delegate class non-trivially destructible.

@drewr95 , to my understanding, after reading #859, this will only happen if I add a non-trivial destructor to the delegate. My suggestion was to add a precondition that lambda and functors must be trivially destructible, otherwise, they must be assigned via a reference_wrapper construction that is always trivially destructible. The capture size will be very small and therefore the only practical capture is to catch a reference ~~([&] captures seem to only become a pointer in GCC for example)~~, this is not a problem. However, due to that ETLCPP doesn't support etl::is_trivially_destructible completely, and the PR also has "breaking change" in the interface, I start to believe this PR will be rejected. :smile_cat:

jwellbelove commented 3 months ago

I think ETLCPP is one of the best things that has happened to embedded C++

Thank you, that's much appreciated.

jwellbelove commented 3 months ago

This interface is somewhat clumsy, but I believe it's impossible to make it easier to use

Yes, I've tried to find way of making it less verbose, but I've not found one yet that really makes a difference. The use of auto helps little.

jwellbelove commented 3 months ago

I've had a bit of a play around this morning. Am I heading in the right sort of direction with this code? MakeStaticLambda returns a reference to a static copy of the lambda.

#include "etl/delegate.h"

#include <iostream>

template <typename TLambda>
TLambda& MakeStaticLambda(TLambda lambda_)
{
  static TLambda static_lambda = lambda_;

  return static_lambda;
}

struct S
{
  S()
  {
    MakeDelegates();
  }

  void Test1()
  {
    std::cout << "Hello\n";
  }

  void Test2()
  {
    std::cout << "World\n";
  }

  void MakeDelegates()
  {
    auto lambda1 = [this]() { this->Test1(); };
    auto lambda2 = [this]() { this->Test2(); };

    d1 = etl::delegate<void(void)>(MakeStaticLambda(lambda1));
    d2 = etl::delegate<void(void)>(MakeStaticLambda(lambda2));
  }

  etl::delegate<void(void)> d1;
  etl::delegate<void(void)> d2;
};

int main()
{
  S s;

  auto d1 = s.d1;
  auto d2 = s.d2;

  d1(); // Prints 'Hello'
  d2(); // Prints 'World'

  return 0;
}

drewr95 commented 3 months ago

@drewr95 , to my understanding, after reading #859, this will only happen if I add a non-trivial destructor to the delegate. My suggestion was to add a precondition that lambda and functors must be trivially destructible, otherwise, they must be assigned via a reference_wrapper construction that is always trivially destructible. The capture size will be very small and therefore the only practical capture is to catch a reference ([&] captures seem to only become a pointer in GCC for example), this is not a problem. However, due to that ETLCPP doesn't support etl::is_trivially_destructible completely, and the PR also has "breaking change" in the interface, I start to believe this PR will be rejected. 😸

@devjoa Thanks for the explanation, I understand now. That was my biggest concern.

devjoa commented 3 months ago

I've had a bit of a play around this morning. Am I heading in the right sort of direction with this code? MakeStaticLambda returns a reference to a static copy of the lambda.

That depends on what you want to achieve. If struct S has multiple instances, then as you probably know, it won't work. However, if I manage to create something similar to std::reference_wrapper but for lambda assignment to delegate, and the current public interface is untouched, could that be a way forward?

I'm thinking about something like this:

   etl::delegate<void(void)> d { etl::delegate_lambda([this](){ ... }) };

The etl::delegate_lambda is very preliminary naming. If I manage to prove that no other parts, like ROM allocation and performance is affected, could that be an acceptable approach? I'm not sure if it is possible though.

jwellbelove commented 3 months ago

Yes, my idea was a limited workaround. I was playing with trying to extend the lambda lifetime with external rather than internal storage. I looked at this a while ago when I, and another person, were discussing having an optional additional template parameter that specified the amount of internal storage that a delegate would create. There were a number of issues with that idea, so it was put to one side.

jwellbelove commented 3 months ago

How do you see the etl::delegate_lambda managing the lambda storage?

jwellbelove commented 3 months ago

An alternative idea could be to define lambda buffers within the class.

  //*************************************
  void MakeDelegates()
  {
    auto lambda1 = [this]() { this->Test1(); };

    ::new (lambda1_buffer) auto(lambda1);

    d1 = etl::delegate<void(void)>(*reinterpret_cast<decltype(lambda1)*>(lambda1_buffer));
  }

  etl::delegate<void(void)> d1;

private:

  char lambda1_buffer[10];
};

jwellbelove commented 3 months ago

If struct S has multiple instances, then as you probably know, it won't work.

Actually, I've just tried multiple instances of S, initialised with different values and it worked as expected.

See https://godbolt.org/z/qWrG5osfc

Output:

ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 0
i = 12
i = 34

#include "etl/delegate.h"
#include "etl/type_traits.h"

#include <iostream>

namespace etl
{
  template<typename T, typename = void>
  struct has_function_operator : etl::false_type {};

  template<typename T>
  struct has_function_operator<T, typename etl::void_t<decltype(&T::operator())>> : etl::true_type {};

  template<typename T>
  constexpr bool has_function_operator_v = has_function_operator<T>::value;

  //*************************************
  template <typename TLambda>
  typename etl::enable_if<etl::has_function_operator<TLambda>::value, TLambda&>::type
    create_static_lambda(TLambda value)
  {
    static TLambda static_value = value;

    return static_value;
  }
}

//*************************************
struct S
{
  //*************************************
  S(int i_)
    : i(i_)
  {
    MakeDelegates();
  }

  //*************************************
  void Test1(int a)
  {
    i *= 2;
    i += a;

    std::cout << "i = " << i << "\n";
  }

  //*************************************
  void MakeDelegates()
  {
    char a = i * 10;

    auto lambda1 = [this, a]() { this->Test1(a); };

    d1 = etl::delegate<void(void)>(etl::create_static_lambda(lambda1));
  }

  int i;

  etl::delegate<void(void)> d1;

private:

};

//*************************************
int main()
{
  S s1(1);
  S s2(2);

  auto d1 = s1.d1;
  auto d2 = s2.d1;

  d1(); // Prints 12
  d2(); // Prints 34

  return 0;
}

jwellbelove commented 3 months ago

You can easily move the create_static_lambda into the lambda/functor delegate constructor.

ETL_CONSTEXPR14 delegate(TLambda& instance)
{
  TLambda& lambda = create_static_lambda(instance);

  assign((void*)(&lambda), lambda_stub<TLambda>);
}

(I can't help thinking there's a flaw in this method somewhere, but I've not seen it yet!)

jwellbelove commented 3 months ago

I realised the error while laying in bed this morning. The output should be 12 & 24, not 12 & 34. (That's what comes of coding too late at night)

devjoa commented 3 months ago

Thanks for your effort. I will take a look at the suggestion. Though I haven't time until the weekend to look at it closer.

jwellbelove commented 3 months ago

The main issue to solve is ensuring that the lambda is stored somewhere that is valid for at least as long as the delegate. That means either storage internal to the delegate or external storage (all without using dynamic memory).

Fixed internal storage would waste memory for non-capturing lambdas. External storage relies on the user to 'do the right thing'. Even packaging up the lambda in a class that forces the user to specify the storage location is not entirely fool-proof.

The only way I can see to make lambdas in delegates fool-proof is to disallow capturing lambdas. (I think this may be possible by testing to see if a lambda type is convertible to a function pointer) Unfortunately this would disallow all functors.

devjoa commented 3 months ago

Finally weekend,

Now I've had a chance to reflect on your suggestions, but none fulfill the functionality I'm searching for. Unfortunately, in both cases, the [this] clauses only reference the s1 object.

What I'm searching for is a solution to put the this pointer of a [this](){this->method()} lambda definition, into the delegate::object storage space. IMHO, it has the same usefulness as assigning member functions via its run time constructor. My approach will probably break the "C++ unused variable optimization" for object (though I don't know if compilers can handle this together with function pointers), and if this is your concern, then my suggestions will most probably fail due to the union solution.

However, I haven't yet done any more research into if objects will be allocated to .rom or not, but the following example code shows that I haven't broken the memory footprint, and the delegate is still trivially destructible with my latest commit, though that with my latest commit to the PR:

#include <etl/delegate.h>
#include <iostream>
#include <type_traits>
#include <cassert>

struct S
{
  S(int i, const char* name_)
    : v{i}, x{i}, name{name_}
  {
  }

  void hello() const
  {
    std::cout << "Hello (" << name << "): " << v << "\n";
  }

  void world()
  {
    x += v;
    std::cout << "world (" << name << "): " << x << "\n";
  }

  void operator()() const {
    std::cout << ".v=" << v << ", .x=" << x << "\n";
  }

private:
  int v;
  int x;
  const char *name;
};

int main()
{
  S s1 {1, "s1"};
  S s2 {10001, "s2"};

  etl::delegate<void(void)> d1 = [&s1](){ s1.hello(); };
  etl::delegate<void(void)> d2 = [&s1](){ s1.world(); };
  etl::delegate<void(void)> d3 = [&s2](){ s2.hello(); };
  etl::delegate<void(void)> d4 = [&s2](){ s2.world(); };

  d1();
  d3();
  d2();
  d4();
  d2();
  d4();

  auto mega1 = [&s1, &s2](){ s1.hello(); s2.world(); };
  auto mega2 = [&s1, &s2](){ s1.world(); s2.hello(); };
  etl::delegate<void(void)> d5 = etl::ref(mega1);
  etl::delegate<void(void)> d6 = etl::ref(mega2);

  std::cout << "======\n";
  d5();
  std::cout << "------\n";
  d6();
  std::cout << "------\n";
  d6();
  std::cout << "------\n";
  d5();
  std::cout << "------\n";
  d6();
  std::cout << "======\n";

  etl::delegate<void(void)> d7 = etl::cref(s1);
  etl::delegate<void(void)> d8 = etl::cref(s2);

  d7();
  d8();

  std::cout << "======\n";

  static_assert(std::is_trivially_destructible_v<decltype(d1)>, "Not compliant to req if fail");
  static_assert(std::is_trivially_destructible_v<decltype(d2)>, "Not compliant to req if fail");
  static_assert(std::is_trivially_destructible_v<decltype(d3)>, "Not compliant to req if fail");
  static_assert(std::is_trivially_destructible_v<decltype(d4)>, "Not compliant to req if fail");
  static_assert(std::is_trivially_destructible_v<decltype(d5)>, "Not compliant to req if fail");
  static_assert(std::is_trivially_destructible_v<decltype(d6)>, "Not compliant to req if fail");
  static_assert(std::is_trivially_destructible_v<decltype(d7)>, "Not compliant to req if fail");
  static_assert(std::is_trivially_destructible_v<decltype(d8)>, "Not compliant to req if fail");

  static_assert(sizeof(d1) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");
  static_assert(sizeof(d2) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");
  static_assert(sizeof(d3) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");
  static_assert(sizeof(d4) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");
  static_assert(sizeof(d5) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");
  static_assert(sizeof(d6) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");
  static_assert(sizeof(d7) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");
  static_assert(sizeof(d8) == (sizeof(void*) + sizeof(void(*)(void))),  "Not compliant to req if fail");

  return 0;
}

jwellbelove commented 3 months ago

If all you're really interested in is achieving functionality you get by using lambdas like [&s1](){ s1.hello(); then why not just use the standard etl::delegate API?

#include <etl/delegate.h>
#include <iostream>
#include <type_traits>
#include <cassert>

struct S
{
  S(int i, const char* name_)
    : v{ i }, x{ i }, name{ name_ }
  {
  }

  void hello() const
  {
    std::cout << "Hello (" << name << "): " << v << "\n";
  }

  void world()
  {
    x += v;
    std::cout << "world (" << name << "): " << x << "\n";
  }

  void operator()() const {
    std::cout << ".v=" << v << ", .x=" << x << "\n";
  }

private:
  int v;
  int x;
  const char* name;
};

int main()
{
  S s1{ 1, "s1" };
  S s2{ 10001, "s2" };

  auto d1 = etl::delegate<void(void)>::create<S, &S::hello>(s1);
  auto d2 = etl::delegate<void(void)>::create<S, &S::world>(s1);
  auto d3 = etl::delegate<void(void)>::create<S, &S::hello>(s2);
  auto d4 = etl::delegate<void(void)>::create<S, &S::world>(s2);

  d1();
  d3();
  d2();
  d4();
  d2();
  d4();
}

jwellbelove commented 3 months ago

You can even make it compile time by moving the instances.

S s1{ 1, "s1" };
S s2{ 10001, "s2" };

int main()
{
  constexpr auto d1 = etl::delegate<void(void)>::create<S, s1, &S::hello>();
  constexpr auto d2 = etl::delegate<void(void)>::create<S, s1, &S::world>();
  constexpr auto d3 = etl::delegate<void(void)>::create<S, s2, &S::hello>();
  constexpr auto d4 = etl::delegate<void(void)>::create<S, s2, &S::world>();

  d1();
  d3();
  d2();
  d4();
  d2();
  d4();
}

main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:s1
        call    S::hello() const
        mov     edi, OFFSET FLAT:s2
        call    S::hello() const
        mov     edi, OFFSET FLAT:s1
        call    S::world()
        mov     edi, OFFSET FLAT:s2
        call    S::world()
        mov     edi, OFFSET FLAT:s1
        call    S::world()
        mov     edi, OFFSET FLAT:s2
        call    S::world()
        xor     eax, eax
        add     rsp, 8
        ret

devjoa commented 3 months ago

Well, my example is intended to show that the PR in its current state doesn't break the memory footprint nor the trivially destructible constraint of the class. It's not a real-world example. I'm well aware of the run-time member function assignment, it's one of the most used ways to assign delegates in my code base because I'm not a big fan of singletons.

What I'm now suggesting, but haven't implemented yet in the PR is an alternative way to use small lambda captures, without breaking the current API. I'm thinking about to use something like this:

etl::delegate<void(void)> d1 = etl::realize([this](){ /* do something here */  }};

where etl::realize(...) is a new std::functional inspired opposite of std::reference_wrapper i.e. a "realize_wrapper" for lambda and functor (because I don't know a way how to distinguish between these two). Just for clarity, I called this function std::lambda_wrapper in an earlier post, but I think "realize" may be a better name (open for discussion).

However, do you see a plausible chance of accepting such PR or do you feel quite satisfied with the current API? If the latter is true, we better close this PR; and if not, I'm glad and open to elaborate further on the topic.

jwellbelove commented 3 months ago

I'm open to ideas about how to solve this, it's just that I'm just not entirely happy with the solutions so far.

Like I said earlier, the issue with internal delegate storage is that it will always be a fixed size, and so either wastes storage or will not be big enough for larger lambdas. Every time the coder creates a lambda with a larger size than the previous max, all of the delegates would have to declare more storage. If this wasn't the ETL then we could simply use dynamically allocated memory, job done!

I think that a lambda/functor wrapper of some sort that creates static external storage for the object is the only real way forward (unless someone can come up with a better solution). The lambda/functor API could be modified to accept a reference to an instance of the wrapper.

devjoa commented 3 months ago

Hi @jwellbelove , sorry for the low activity from me for a while. I still don't see the difference between allowing lambda captures with the size of 'void*' and references to functors, lambdas, methods, etc. The size limits won't be changed. It's just a convenience thing in my eyes.

However, I close this PR now as it doesn't match the manifesto of ETL. Cheers for now. :smiley:

ETLCPP / etl

Draft: Add Small Buffer-Optimization for lambda and functor created delegates (C11) #867