Avoid using std::function in event dispatcher for performance

Chlorie commented 5 years ago

Currently Hazel is using std::function as the parameter to event dispatchers. https://github.com/TheCherno/Hazel/blob/5dc2ee187fae6a3efa1d13b61faf5ca67e3638bd/Hazel/src/Hazel/Events/Event.h#L64-L73 But as we all know, std::function uses type erasure under the hood, which renders this approach inefficient for high performance needs:

The size of a class (in this case std::function<R(Ts...)>) is fixed at compile time, but since any kind of functor with the corresponding signature might be used to construct the std::function object, dynamic allocations may happen. Furthermore, if the functor passed is big (e.g. a lambda with several captures by value), it also needs to copy a large chunk of memory. But all these aren't necessary.
Modern compilers can be really good at inlining trivial function calls, like when we pass a rather simple lambda to a consumer function which calls the lambda in some way, the call might be inlined thus done without overhead. But with the type erasure mechanism used by std::function, the compiler in most cases cannot make sense of all these weird things thus it cannot do inlining, which makes the case worse.

Therefore I propose a simple solution to this problem, just template on the type of the functor.

template <typename T, typename F>
bool Dispatch(const F& func)
{
    if (m_Event.GetEventType() == T::GetStaticType())
    {
        // BTW you can just simply cast the event reference to T&, no need for the addressing and dereferencing
        m_Event.Handled = func(static_cast<T&>(m_Event));  
        return true;
    }
    return false;
}

The F parameter can be deduced by the compiler and the user passes in T anyways.

Plus, for a better user interface, even the parameter T can be automatically deduced using some tricks:

namespace detail
{
    // These thing can get the parameter type T out of the lambda type F
    // The following two functions don't need to be implemented since only the function signatures are used
    template <typename F, typename T>
    T dispatch_helper(bool (F::*)(T&));
    template <typename F, typename T>
    T dispatch_helper(bool (F::*)(T&) const);
    template <typename F>
    using DispatchEventType = decltype(dispatch_helper(&F::operator()));
}
...
// The second template parameter SFINAE's out lambdas that are not event handlers
// So if you pass in just a random functor it will be a compile error, which is neat
template <typename F, typename = decltype(detail::DispatchEventType<F>::GetStaticType())>
bool Dispatch(F&& func)
{
    using T = detail::DispatchEventType<F>;  // Use the helper to get the parameter type
    if (m_Event.GetEventType() == T::GetStaticType())
    {
        m_Event.Handled = std::forward<F>(func)(static_cast<T&>(m_Event));
        return true;
    }
    return false;
}

And now everything is solved!

ilikenoodles2 commented 5 years ago

I'd like to note that when timing Application's OnEvent, I get an average time of 0.0119ms, whereas when templating the function, the average time is 0.0095ms. Definitely an improvement.

Edit: To clarify, this was with only one call to Dispatch

WhoseTheNerd commented 5 years ago

As @ilikenoodles2 benchmarked it, the speed improvement is definitely very small. Only 2.4 microseconds. So this is likely a micro optimization. "Premature optimization is the root of all evil".

Chlorie commented 5 years ago

As @ilikenoodles2 benchmarked it, the speed improvement is definitely very small. Only 2.4 microseconds. So this is likely a micro optimization. "Premature optimization is the root of all evil".

But I don't think a 20%+ improvement is "micro optimization", since the event system is used all over the code. Plus making it a template doesn't cost you anything, and it gives you better performance together with better IDE diagnostics. (Those red lines under your code is way better than matryoshka template substitution error after you hit the compile button.) I agree that premature optimization is the root of all evil, but what optimization is premature depends on each person's opinion. If you're just spending time trying to do things right in my humble opinion it is not premature.

LovelySanta commented 5 years ago

In my opinion, an optimalisation must be measured in relative improvement instead of the overal time.

As @WhoseTheNerd said, 2.4us optimalisation might not sound alot, but on an operation that only took 11.9us to begin with, this is an optimalisation that is >20% as @Chlorie said.

I can totaly understand that for example #76 is a premature optimalisation, but this has a great impact for the functionality it provides:

It is >20% faster to distribute events (keep in mind it will do this for each layer, for each event)
Feature: passing wrong functor will cause compilor error

If it was up to me, I would suggest approve starting a PR to implement this...

WhoseTheNerd commented 5 years ago

It's been almost a month and PR is still not implemented. :thinking:

CleverSource commented 5 years ago

No one has even made a PR for this. OP should consider making a PR for this or someone else could, I'd be up for it though.

TheCherno / Hazel

Avoid using std::function in event dispatcher for performance #85