Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.73k stars 445 forks source link

[7.14] Slow coroutine step execution: TunnelCommunity.do_remove() #8031

Open kozlovsky opened 1 month ago

kozlovsky commented 1 month ago

After the slow coroutine stack tracing was enabled in the debug version of the 7.14 release, some slow coroutines were detected.

One traceback that regularly appears is:

  File "src\<relief>\core\start_core.py", line 206, in run_core
  File "src\<relief>\core\start_core.py", line 169, in run_tribler_core_session
  File "Python38\lib\asyncio\base_events.py", line 603, in run_until_complete
  File "Python38\lib\asyncio\windows_events.py", line 316, in run_forever
  File "Python38\lib\asyncio\base_events.py", line 570, in run_forever
  File "Python38\lib\asyncio\base_events.py", line 1859, in _run_once
  File "core\utilities\slow_coro_detection\patch.py", line 37, in patched_handle_run
  File "Python38\lib\asyncio\events.py", line 81, in _run
  File "Python38\lib\site-packages\ipv8\taskmanager.py", line 18, in interval_runner
  File "Python38\lib\site-packages\ipv8\messaging\anonymization\hidden_services.py", line 194, in do_circuits
  File "Python38\lib\site-packages\ipv8\messaging\anonymization\community.py", line 186, in do_circuits

Here:

    def do_circuits(self):
        for circuit_length, num_circuits in self.circuits_needed.items():
            num_to_build = max(0, num_circuits - len(self.find_circuits(state=None, hops=circuit_length)))
            self.logger.info("Want %d data circuits of length %d", num_to_build, circuit_length)
            for _ in range(num_to_build):
                if not self.create_circuit(circuit_length):
                    self.logger.info("circuit creation of %d circuits failed, no need to continue", num_to_build)
                    break
        self.do_remove()  # <-- line 186

Some registered events with slow execution of the coroutine:

In these cases, the actual core crash caused by invalid pointer access was probably not related to the slow execution of the coroutine, but the coroutine step with TunnelCommunity.do_remove was the slowest coroutine step before the crash.

It looks like the do_remove method has no particular line that is especially slow (because otherwise, the stack trace would also display this specific line), and the method just needs a general performance optimization.