Ableton / link

Ableton Link
Other
1.09k stars 149 forks source link

Memory leak in Esp32 when not connected to WiFi #74

Closed mathiasbredholt closed 4 years ago

mathiasbredholt commented 4 years ago

esp-idf v4.1 I noticed a memory leak in the Esp32 example and I managed to narrow it down to the std::thread being called in LockFreeCallbackDispatcher. By removing the thread as a member, the memory leak was stopped. I'm not sure what the cause of the leak is, but I'm thinking that maybe we could find a way to make the Esp32 version work without threads at all?

fgo-ableton commented 4 years ago

So this does not happen when using captureAppSessionState()?

mathiasbredholt commented 4 years ago

Hmm, now I suddenly can't reproduce the memory leak in the example.. It only appears in my own project even if I don't use captureAudioSessionState. At the moment the only thing, that makes it stop is disabling the LockFreeCallbackDispatcher. I will do some more tests, and see what happens.

mathiasbredholt commented 4 years ago

I found that the memory leak happens when the Esp32 is not connected to a WiFi. It happens from just creating a ableton::Link instance. I think it's caused by the underlying asio port. I will keep you updated!

fgo-ableton commented 4 years ago

@mathiasbredholt If you have a reproducible scenario using the example app let me know. I have not been able to see a leak so far.

mathiasbredholt commented 4 years ago

@fgo-ableton Here is a scenario when it happens. https://github.com/mathiasbredholt/link/blob/memory-leak/examples/esp32/main/main.cpp

This is the scenario when a Link-enabled device is turned on in an area where there's no WiFi. In that case the device will still instantiate the Link object and this causes the leak. The poll function crashes the Esp32 when it's not connected to a network, so I disabled it for this situation. Is this what's causing the leak? Does poll need to be called to repeatedly to free the allocated memory? In that case we should find a way to prevent the poll function to crash when there's no connection.

mathiasbredholt commented 4 years ago

poll throws the following exception when WiFi is not connected:

E (598) esp32_asio_exception: Caught exception: set_option: Address not available!
abort() was called at PC 0x400e9307 on core 0
0x400e9307: void asio::detail::throw_exception<std::system_error>(std::system_error const&) at /Users/mathias/esp/esp-idf/components/asio/port/include/esp_exception.h:34

ELF file SHA256: 39304b43c6ec803739b8c6f4f2ba686a1b1b64c5842899fcb17fe82eb25ad4fa

Backtrace: 0x40085d65:0x3ffb86e0 0x400860dd:0x3ffb8700 0x400e9307:0x3ffb8720 0x400e949a:0x3ffb8740 0x400e16d3:0x3ffb87c0 0x400e1965:0x3ffb8950 0x400e19aa:0x3ffb8980 0x400e222a:0x3ffb8aa0 0x400e23e1:0x3ffb8d10 0x400e252d:0x3ffb8de0 0x400e2676:0x3ffb8e40 0x400ea185:0x3ffb8e80 0x400ea2f1:0x3ffb8ec0 0x400ea361:0x3ffb8f10 0x400d6f0c:0x3ffb8f40 0x400d2a5e:0x3ffb8f70 0x400878ed:0x3ffb8f90
0x40085d65: invoke_abort at /Users/mathias/esp/esp-idf/components/esp32/panic.c:155

0x400860dd: abort at /Users/mathias/esp/esp-idf/components/esp32/panic.c:172

0x400e9307: void asio::detail::throw_exception<std::system_error>(std::system_error const&) at /Users/mathias/esp/esp-idf/components/asio/port/include/esp_exception.h:34

0x400e949a: asio::detail::do_throw_error(std::error_code const&, char const*) at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/impl/throw_error.ipp:49

0x400e16d3: ableton::platforms::asio::Socket<512u> ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>::openMulticastSocket<512u>(asio::ip::address_v4 const&) at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/throw_error.hpp:41
 (inlined by) ?? at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/basic_socket.hpp:936
 (inlined by) ableton::platforms::asio::Socket<512u> ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>::openMulticastSocket<512u>(asio::ip::address_v4 const&) at /Users/mathias/Documents/software/link/include/ableton/platforms/esp32/Context.hpp:92

0x400e1965: ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u> ableton::discovery::makeIpV4Interface<512u, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>(ableton::util::Injected<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>, asio::ip::address_v4 const&) at /Users/mathias/Documents/software/link/include/ableton/discovery/IpV4Interface.hpp:52
 (inlined by) ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u> ableton::discovery::makeIpV4Interface<512u, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>(ableton::util::Injected<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>, asio::ip::address_v4 const&) at /Users/mathias/Documents/software/link/include/ableton/discovery/IpV4Interface.hpp:119

0x400e19aa: ableton::discovery::PeerGateway<ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::util::Injected<ableton::util::Injected<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::type&>::type&, 512u>, ableton::link::PeerState, ableton::util::Injected<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::type&>, ableton::link::Peers<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, std::reference_wrapper<ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionPeerCounter>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionTimelineCallback, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionStartStopStateCallback>::GatewayObserver, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&> ableton::discovery::makeIpV4Gateway<ableton::link::Peers<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, std::reference_wrapper<ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionPeerCounter>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionTimelineCallback, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionStartStopStateCallback>::GatewayObserver, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>(ableton::util::Injected<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>, asio::ip::address_v4 const&, ableton::util::Injected<ableton::link::Peers<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, std::reference_wrapper<ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionPeerCounter>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionTimelineCallback, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::SessionStartStopStateCallback>::GatewayObserver>, ableton::link::PeerState) at /Users/mathias/Documents/software/link/include/ableton/discovery/PeerGateway.hpp:243

0x400e222a: ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::GatewayFactory::operator()(std::pair<ableton::link::NodeState, ableton::link::GhostXForm>, ableton::util::Injected<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>, asio::ip::address const&) at /Users/mathias/Documents/software/link/include/ableton/link/Gateway.hpp:50
 (inlined by) ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::GatewayFactory::operator()(std::pair<ableton::link::NodeState, ableton::link::GhostXForm>, ableton::util::Injected<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>, asio::ip::address const&) at /Users/mathias/Documents/software/link/include/ableton/link/Controller.hpp:705

0x400e23e1: void ableton::discovery::PeerGateways<std::pair<ableton::link::NodeState, ableton::link::GhostXForm>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::GatewayFactory, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::Callback::operator()<std::vector<asio::ip::address, std::allocator<asio::ip::address> > >(std::vector<asio::ip::address, std::allocator<asio::ip::address> > const&) at /Users/mathias/Documents/software/link/include/ableton/discovery/PeerGateways.hpp:172 (discriminator 5)

0x400e252d: ableton::discovery::InterfaceScanner<std::shared_ptr<ableton::discovery::PeerGateways<std::pair<ableton::link::NodeState, ableton::link::GhostXForm>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::GatewayFactory, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::Callback>, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::scan() at /Users/mathias/Documents/software/link/include/ableton/discovery/InterfaceScanner.hpp:72

0x400e2676: asio::detail::completion_handler<ableton::discovery::PeerGateways<std::pair<ableton::link::NodeState, ableton::link::GhostXForm>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::GatewayFactory, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::enable(bool)::{lambda()#1}>::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned int) at /Users/mathias/Documents/software/link/include/ableton/discovery/InterfaceScanner.hpp:54
 (inlined by) ableton::discovery::PeerGateways<std::pair<ableton::link::NodeState, ableton::link::GhostXForm>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::GatewayFactory, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::enable(bool)::{lambda()#1}::operator()() const at /Users/mathias/Documents/software/link/include/ableton/discovery/PeerGateways.hpp:76
 (inlined by) ?? at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/handler_invoke_hook.hpp:68
 (inlined by) ?? at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/handler_invoke_helpers.hpp:37
 (inlined by) ?? at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/handler_work.hpp:81
 (inlined by) asio::detail::completion_handler<ableton::discovery::PeerGateways<std::pair<ableton::link::NodeState, ableton::link::GhostXForm>, ableton::link::Controller<std::function<void (unsigned int)>, std::function<void (ableton::link::Tempo)>, std::function<void (bool)>, ableton::platforms::esp32::Clock, ableton::platforms::esp32::Random, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::GatewayFactory, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog> >::enable(bool)::{lambda()#1}>::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned int) at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/completion_handler.hpp:69

0x400ea185: asio::detail::scheduler::do_poll_one(asio::detail::conditionally_enabled_mutex::scoped_lock&, asio::detail::scheduler_thread_info&, std::error_code const&) at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/scheduler_operation.hpp:39
 (inlined by) asio::detail::scheduler::do_poll_one(asio::detail::conditionally_enabled_mutex::scoped_lock&, asio::detail::scheduler_thread_info&, std::error_code const&) at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/impl/scheduler.ipp:534

0x400ea2f1: asio::detail::scheduler::poll(std::error_code&) at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/detail/impl/scheduler.ipp:220 (discriminator 1)

0x400ea361: asio::io_context::poll() at /Users/mathias/esp/esp-idf/components/asio/asio/asio/include/asio/impl/io_context.ipp:91

0x400d6f0c: app_main at /Users/mathias/Documents/software/link/include/ableton/platforms/esp32/Context.hpp:131
 (inlined by) app_main at /Users/mathias/Documents/software/link/examples/esp32/build/../main/main.cpp:112

0x400d2a5e: main_task at /Users/mathias/esp/esp-idf/components/esp32/cpu_start.c:554

0x400878ed: vPortTaskWrapper at /Users/mathias/esp/esp-idf/components/freertos/port.c:143
mathiasbredholt commented 4 years ago

Ok, I tried disabling the exception in esp-idf/components/asio/port/include/esp_exception.h:

//
// asio exception stub
//
namespace asio {
namespace detail {
template <typename Exception>
void throw_exception(const Exception& e)
{
  // ESP_LOGE("esp32_asio_exception", "Caught exception: %s!", e.what());   
  // abort();
}
}}
#endif // CONFIG_COMPILER_CXX_EXCEPTIONS==1 && defined (ASIO_NO_EXCEPTIONS)

Now I can call poll without crashing if the Esp32 is not connected to a network.

fgo-ableton commented 4 years ago

@mathiasbredholt I pushed a branch with some changes: https://github.com/Ableton/link/tree/esp-fixes The first commit fixes the crash when not being connected to the WIFI, by simply checking if the interface is up before using it. I also removed the necessity to poll the Link Context from the main thread. It still does not use the exception handlers that the non-ESP-platforms use, but just fails silently. On the other platforms we run multiple instances of asio::io_service, but I did not manage to get that working reliably on the ESP. It would be cool if you could give it a try, and let me know if it works for you.

mathiasbredholt commented 4 years ago

@fgo-ableton That's great news! Great work, really good idea with the interface checking, and also I think it's a good decision to replace the pthreads with tasks. I will try it out right away!

mathiasbredholt commented 4 years ago

@fgo-ableton I tested it and it works great :) :) :) Thanks for fixing this issue. In my tests I found another issue though when the WiFi connection is poor. The UdpMessenger throws an exception if the sendUdpMessage fails, which causes the ESP32 to abort. By enabling C++ exceptions in menuconfig I can see that the exception is thrown when it's trying to send the peer state (UdpMessenger.hpp:223). Should we catch the exception as in the destructor of UdpMessenger?

The exception is printed below:

abort() was called at PC 0x4014eb87 on core 1
0x4014eb87: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/HOST-x86_64-apple-darwin12/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47

ELF file SHA256: 497c4e18a0eb4ffcc6fe7a17b247bf9631eeb74a049b6285d035157aa81ade21

Backtrace: 0x4008bb11:0x3ffe5a10 0x4008be31:0x3ffe5a30 0x4014eb87:0x3ffe5a50 0x4014ebce:0x3ffe5a70 0x4014f77f:0x3ffe5a90 0x400e1075:0x3ffe5ab0 0x400e13c5:0x3ffe5d70 0x400e1536:0x3ffe60b0 0x400e18b9:0x3ffe6150 0x400db07f:0x3ffe6180 0x400de7a7:0x3ffe61b0 0x400f8589:0x3ffe6200 0x400f86a2:0x3ffe6240 0x400f8755:0x3ffe6290 0x400d6452:0x3ffe62c0 0x40090bd5:0x3ffe62e0
0x4008bb11: invoke_abort at /Users/mathiasbredholt/esp/esp-idf/components/esp32/panic.c:155

0x4008be31: abort at /Users/mathiasbredholt/esp/esp-idf/components/esp32/panic.c:172

0x4014eb87: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/HOST-x86_64-apple-darwin12/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47

0x4014ebce: std::terminate() at /builds/idf/crosstool-NG/.build/HOST-x86_64-apple-darwin12/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57

0x4014f77f: __cxa_throw at /builds/idf/crosstool-NG/.build/HOST-x86_64-apple-darwin12/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95

0x400e1075: void ableton::discovery::sendUdpMessage<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::NodeId, ableton::discovery::Payload<ableton::link::Timeline, ableton::discovery::Payload<ableton::link::SessionMembership, ableton::discovery::Payload<ableton::link::StartStopState, ableton::discovery::Payload<ableton::link::MeasurementEndpointV4, ableton::discovery::Payload<> > > > > >(ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>&, ableton::link::NodeId, unsigned char, unsigned char, ableton::discovery::Payload<ableton::link::Timeline, ableton::discovery::Payload<ableton::link::SessionMembership, ableton::discovery::Payload<ableton::link::StartStopState, ableton::discovery::Payload<ableton::link::MeasurementEndpointV4, ableton::discovery::Payload<> > > > > const&, asio::ip::basic_endpoint<asio::ip::udp> const&) at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/discovery/UdpMessenger.hpp:70 (discriminator 6)

0x400e13c5: ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::Impl::sendPeerState(unsigned char, asio::ip::basic_endpoint<asio::ip::udp> const&) at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/discovery/UdpMessenger.hpp:223

0x400e1536: ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::Impl::broadcastState() at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/discovery/UdpMessenger.hpp:216

0x400e18b9: ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::Impl::broadcastState()::{lambda(std::error_code)#1}::operator()(std::error_code) const at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/discovery/UdpMessenger.hpp:208
 (inlined by) ableton::platforms::asio::AsioTimer::AsyncHandler::operator=<ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::Impl::broadcastState()::{lambda(std::error_code)#1}>(ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::Impl::broadcastState()::{lambda(std::error_code)#1})::{lambda(std::error_code)#1}::operator()(std::error_code) const at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/platforms/asio/AsioTimer.hpp:111
 (inlined by) std::_Function_handler<void (std::error_code), ableton::platforms::asio::AsioTimer::AsyncHandler::operator=<ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::Impl::broadcastState()::{lambda(std::error_code)#1}>(ableton::discovery::UdpMessenger<ableton::discovery::IpV4Interface<ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&, 512u>, ableton::link::PeerState, ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>&>::Impl::broadcastState()::{lambda(std::error_code)#1})::{lambda(std::error_code)#1}>::_M_invoke(std::_Any_data const&, std::error_code&&) at /Users/mathiasbredholt/.espressif/tools/xtensa-esp32-elf/esp-2019r2-8.2.0/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/8.2.0/bits/std_function.h:297

0x400db07f: std::function<void (std::error_code)>::operator()(std::error_code) const at /Users/mathiasbredholt/.espressif/tools/xtensa-esp32-elf/esp-2019r2-8.2.0/xtensa-esp32-elf/xtensa-esp32-elf/include/c++/8.2.0/bits/std_function.h:687

0x400de7a7: asio::detail::wait_handler<ableton::util::SafeAsyncHandler<ableton::platforms::asio::AsioTimer::AsyncHandler> >::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned int) at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/platforms/asio/AsioTimer.hpp:119
 (inlined by) ?? at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/util/SafeAsyncHandler.hpp:51
 (inlined by) ?? at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/detail/bind_handler.hpp:64
 (inlined by) ?? at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/handler_invoke_hook.hpp:68
 (inlined by) ?? at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/detail/handler_invoke_helpers.hpp:37
 (inlined by) ?? at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/detail/handler_work.hpp:81
 (inlined by) asio::detail::wait_handler<ableton::util::SafeAsyncHandler<ableton::platforms::asio::AsioTimer::AsyncHandler> >::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned int) at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/detail/wait_handler.hpp:71

0x400f8589: asio::detail::scheduler::do_poll_one(asio::detail::conditionally_enabled_mutex::scoped_lock&, asio::detail::scheduler_thread_info&, std::error_code const&) at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/detail/scheduler_operation.hpp:39
 (inlined by) asio::detail::scheduler::do_poll_one(asio::detail::conditionally_enabled_mutex::scoped_lock&, asio::detail::scheduler_thread_info&, std::error_code const&) at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/detail/impl/scheduler.ipp:534

0x400f86a2: asio::detail::scheduler::poll(std::error_code&) at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/detail/impl/scheduler.ipp:220

0x400f8755: asio::io_context::poll() at /Users/mathiasbredholt/esp/esp-idf/components/asio/asio/asio/include/asio/impl/io_context.ipp:91

0x400d6452: ableton::platforms::esp32::Context<ableton::platforms::esp32::ScanIpIfAddrs, ableton::util::NullLog>::ServiceRunner::run(void*) at /Users/mathiasbredholt/Documents/sw/devel/torso/external/link/include/ableton/platforms/esp32/Context.hpp:50

0x40090bd5: vPortTaskWrapper at /Users/mathiasbredholt/esp/esp-idf/components/freertos/port.c:143
fgo-ableton commented 4 years ago

@mathiasbredholt Ja. That version is not handling any exceptions. I pushed a new version to the branch. This one should reset the gateway when an exception is thrown on a socket. It actually does pretty much what we are doing for the other platforms. With the difference that we start a task instead of a thread per context. That makes it possible to handle the exceptions the same way we do on the other platforms. However, this does not run very stable for me. Maybe you can give it a try and tweak the sdkconfig to make it work?

mathiasbredholt commented 4 years ago

@fgo-ableton I ran the this commit and I had some issues as well. I think it will perform better if we can run Link in a single task? Would it be possible to combine the solution in the master branch and this branch such that all contexts run in the same task? It would still make sense to create the task in Context.hpp I think. Also, it seems that this branch is opening more sockets than the master branch, as I'm getting a too many sockets (ESP-IDF allows max 16) error in some of my Link ESP32 applications, this seems to be related to the Contexts.

Also, I think we should portYIELD() instead of vTaskDelay(1). This should perform much better as it forces the scheduler to do a context-switch immediately instead of waiting 1 ms.

mathiasbredholt commented 4 years ago

Gave it a go here, this one works for me. https://github.com/mathiasbredholt/link/blob/esp-fixes/include/ableton/platforms/esp32/Context.hpp

fgo-ableton commented 4 years ago

@mathiasbredholt Thanks for testing! I agree, that running everything in a single task seems to scale better. Have you tested your example with weak wifi compatibility? You're using the same exception handler for all contexts. So if that actually uses the right exception handler, it would be a coincidence. I think we somehow have to use the proper handler when calling Context::async(Handler handler). ATM I am not sure, what would be the best approach for that.

mathiasbredholt commented 4 years ago

@fgo-ableton I did do a quick test of the esp32 example with weak wifi it seemed to work but more tests should be done. I have very little experience with exception handling, but I will read up on it. I will look through the asio documentation.

fgo-ableton commented 4 years ago

@mathiasbredholt There is nothing special about exception handling we are doing here. This is more about how we use asio::io_service. When we construct a Context (that usually contains it's own io_service and thread) we optionally pass an exception handler. When we construct the networking part of Link, we pass a handler that restarts the networking on an interface in case an exception is thrown. When using the default constructor for a Context we just ignore any thrown exception.

mathiasbredholt commented 4 years ago

@fgo-ableton So I been trying to fix the crashes and I found out that some of the crashes were due to the InterfaceScanner finding 0.0.0.0 for STA interface. I've updated the InterfaceScanner such that it doesn't accept 0.0.0.0. For the exceptions, I suggest that we use a simple catch-all handler, as the esp framework already handles re-connection in poor signal conditions. I submitted a pull request to the esp-fixes branch https://github.com/Ableton/link/pull/76.

fgo-ableton commented 4 years ago

@mathiasbredholt I am a bit confused by the stuff that is in your PR. It seems to do a lot more than checking the IP address.

I tweaked my branch from a couple of weeks ago a little and rebased it onto master. This one had a catch all exception handler. Could you check if that works for you? If you have suggestions for changes let me know.

The branch does not check the IP address, but it checks that the interface is up. This seems to work for me. Are you sure checking the IP address is necessary?

mathiasbredholt commented 4 years ago

@fgo-ableton I see, there's also a lot of changes in there. I can make a pull request for each change, describing the reasoning behind each? I look through your tweaks quickly and I have a couple of comments:

IP address:

Task affinity:

Single task and service for all Contexts

sdkconfig.defaults instead of sdkconfig in example

Using task notifications instead of semaphores

We've been running thorough testing with my version, and we've not been able to make the Esp32 crash from any Link related code.

fgo-ableton commented 4 years ago

@mathiasbredholt Thanks. This makes it way clearer to me.

IP address:

* It seems that even if the interface is up, `tcpip_adapter_get_ip_info`  will in some cases (it seems to be when WiFi connection is weak) return the IP address 0.0.0.0, this will cause a crash later on, as some of the lwip API functions will fail, when a socket is bound to this address.

* Also, I can't think of a case where this IP address should be used as the gateway in a Link session. Let me know if you can come up with one.

I don't know about the gateway. But I get the point about checking for 0.0.0.0. Makes sense to add it then.

Task affinity:

* After studying the esp-idf framework, I argue that Link should be running in dual-core mode per default for the following reasons

  * TCP/IP task implemented through lwip is running in dual-core mode per default
  * It has better performance due to parallelism
  * It is the most flexible mode for different project configurations as it allows Link to run even if a single task is preempting all the other tasks on one of the cores (as in the case of the arduino-esp32, the Arduino environment for Esp32

* In this mode, Link could be made into an Arduino library for the development of interesting open-source Link-enabled Digital Musical Instruments.

* At least it should be pinned to core 0 as the Arduino environment runs on core 1.

Single task and service for all Contexts

* Each task adds to scheduling overhead, as the scheduler needs to run before the task can be executed and also it seems that each asio::service is using a socket, which limits the possibilities for using sockets for other purposes simultaneously with running Link.

* This is debatable as I don't have a deep understanding of the implications of running everything within a single service, but I believe that there is a significant boost in performance to be gained from this.

In the Link code base, everything that is running in the same context should never be accessed from multiple threads at the same time. This could cause race conditions. That is why I tried to run one task per context earlier. If that would not cause so much overhead, it would allow to parallelize in a memory safe way. But when using one task, we will never really run anything in parallel, right? Even if we don't tie the task to a core, it could either run on one core or the other, but it will never run on both at the same time. This would maybe be beneficial for Link, as there might be less situations where it has to wait for resources. But this is also less predictable for the application developer. Wouldn't it make sense to have a thread where I have a better chance to generate the audio-relevant stuff without potentially being blocked by Link networking?

portYIELD() in LockFreeCallbackDispatcher

* Using `portYIELD()` instead of `vTaskDelay(1)`, allows Link to be tick-frequency independent, which is advantageous for different project configurations. It also allows any pending tasks to be executed immediately, instead of waiting for the next tick.

Makes sense!

sdkconfig.defaults instead of sdkconfig in example

* I found that with the changes above, the only necessary changes to sdkconfig for esp-idf v4.0 is to enable C++ exceptions with `CONFIG_COMPILER_CXX_EXCEPTIONS=y`

* The esp-idf allows for creating a settings file `sdkconfig.defaults` containing the only necessary settings for the project. This way, the sdkconfig can be added to .gitignore, allowing for  transparency of which settings that are necessary to change for Link to work.

Make sense too!

Using task notifications instead of semaphores

* According to the official FreeRTOS documentation:

* > Unblocking an RTOS task with a direct notification is 45% faster * and uses less RAM than unblocking a task with a binary semaphore.

I see. So we should do that in the callback dispatcher and probably also in the example app.

We've been running thorough testing with my version, and we've not been able to make the Esp32 crash from any Link related code.

😄

mathiasbredholt commented 4 years ago

@fgo-ableton Thanks for your reply! A few comments:

But when using one task, we will never really run anything in parallel, right? Even if we don't tie the task to a core, it could either run on one core or the other, but it will never run on both at the same time.

You're right. I guess my argument about running in parallel didnt't make much sense. But I still think Link shouldn't be tied to a core. Alternatively, we could make it a user setting using a Kconfig file? xTaskCreatePinnedToCore allows for the arguments 0, 1, and tskNO_AFFINITY.

This would maybe be beneficial for Link, as there might be less situations where it has to wait for resources. But this is also less predictable for the application developer. Wouldn't it make sense to have a thread where I have a better chance to generate the audio-relevant stuff without potentially being blocked by Link networking?

You're right about this one too. I think it makes sense. Maybe the scheduling overhead is insignificant versus the Link networking block time? Could be interesting to test.

I will test your version try to get some information about tasks running, CPU usage etc.

fgo-ableton commented 4 years ago

@mathiasbredholt I just pushed a commit that allows setting the core via CMake. I.e.: idf.py build -DLINK_ESP_TASK_CORE_ID=tskNO_AFFINITY The default is still 1.

mathiasbredholt commented 4 years ago

@fgo-ableton Great, thanks! That works for me! I see you added the other changes too 💯 💯 💯

mathiasbredholt commented 4 years ago

@fgo-ableton A small detail, the sdkconfig should be sdkconfig.defaults and not sdkconfig.default

mathiasbredholt commented 4 years ago

@fgo-ableton So I tested your version, and I still sometimes get a crash from the UdpMessenger throwing an unhandled UdpSendException.

mathiasbredholt commented 4 years ago

Do you know if there's any difference between

catch (const std::exception& e)
{
}

and

catch (...)
{
}
fgo-ableton commented 4 years ago

Sort of... The second version should not just work for std::exceptions but for other arguments as well. I'm not sure atm though what happens, when an UdpSendException is thrown and the catch expects a std::exception. But apparently it throws... Does that fix the issue?

mathiasbredholt commented 4 years ago

Yes it seems that it fixes the issue.. Strange..

fgo-ableton commented 4 years ago

Actually not so strange: http://www.cplusplus.com/doc/tutorial/exceptions/ Makes sense. I'll push a fixup

mathiasbredholt commented 4 years ago

@fgo-ableton My solution also came from the cpp tutorials on exceptions. But great, that sounds good.

mathiasbredholt commented 4 years ago

@fgo-ableton I was able to get some run time stats by enabling the option in the FreeRTOS settings under sdkconfig, and using vTaskGetRunTimeStats in the printTask. Here are the results. The number in the middle is the absolute execution time in the task and the percentage is the percentage of total system execution time. I had to change the priority of the printTask to 10 (or something higher than 2, the Link priority) for the test with the Link pinned to core 1, otherwise it would never run.

The test was done @240 MHz with release optimization level.

Pinned to core 1

This one I couldn't get to work as the tick task pre-empts the Link task due to the high priority and frequency of the tick task. Link was not connecting to my computer.

tick            21267094        40%
print           312512      <1%
link            2417447     4%
IDLE0           23759490        44%
IDLE1           1995646     3%
link            28558       <1%
tiT             1055686     1%
Tmr Svc         35      <1%
sys_evt         14075       <1%
esp_timer       68046       <1%
wifi            1747702     3%
ipc0            20489       <1%
ipc1            62582       <1%

Pinned to core 0

tick            9207956     39%
print           122184      <1%
link            8101476     34%
IDLE1           2377238     10%
link            2558        <1%
IDLE0           1735376     7%
tiT             838119      3%
Tmr Svc         34      <1%
ipc1            62579       <1%
esp_timer       31797       <1%
wifi            974098      4%
sys_evt         12738       <1%
ipc0            20484       <1%

No affinity

tick            11378184        40%
print           138780      <1%
link            9943166     35%
IDLE1           2475851     8%
link            3509        <1%
IDLE0           1749610     6%
tiT             992428      3%
Tmr Svc         39      <1%
ipc0            20484       <1%
esp_timer       39903       <1%
wifi            1320419     4%
ipc1            62604       <1%
sys_evt         14090       <1%
fgo-ableton commented 4 years ago

@mathiasbredholt Thanks! Yes, pinning Link to the same core as the tick task wasn't gonna work... I'll remove the pinning, also for the task in main.cpp, and leave the option to enable it.

mathiasbredholt commented 4 years ago

@fgo-ableton Great! That sounds like a nice solution!

mathiasbredholt commented 4 years ago

@fgo-ableton Also, I found that the bswap64 is implemented in esp-idf under endian.h, so Esp32.hpp could be simplified to

/* Copyright 2019, Mathias Bredholt, Torso Electronics, Copenhagen. All rights reserved.
 *
 *  This program is free software: you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation, either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

#pragma once

#include <endian.h>

#ifndef ntohll
#define ntohll(x) bswap64(x)
#endif

#ifndef htonll
#define htonll(x) bswap64(x)
#endif
fgo-ableton commented 4 years ago

@mathiasbredholt I just merged the changes to master.

mathiasbredholt commented 4 years ago

@fgo-ableton Thanks! Looks great, I think we are getting close to a stable version.