Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.74k stars 445 forks source link

[7.13.0-RC1] OSError: [WinError 10014] Bad address. #7352

Open sentry-for-tribler[bot] opened 1 year ago

sentry-for-tribler[bot] commented 1 year ago

Sentry Issue: TRIBLER-14S

CoreConnectionError: The connection to the Tribler Core was lost
OSError: [WinError 10014] Bei dem Versuch das Zeigerargument eines Aufrufs zu

Translation:

CoreConnectionError: The connection to the Tribler Core was lost
OSError: [WinError 10014] When trying to set the pointer argument of a call to

https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2:

WSAEFAULT 10014 Bad address. The system detected an invalid pointer address in attempting to use a pointer argument of a call. This error occurs if an application passes an invalid pointer value, or if the length of the buffer is too small. For instance, if the length of an argument, which is a sockaddr structure, is smaller than the sizeof(sockaddr).

Traceback (most recent call last):
  File "run_tribler.py", line 97, in <module>
  File "tribler\core\start_core.py", line 201, in run_core
  File "tribler\core\start_core.py", line 166, in run_tribler_core_session
  File "asyncio\base_events.py", line 603, in run_until_complete
  File "asyncio\base_events.py", line 570, in run_forever
  File "asyncio\base_events.py", line 1823, in _run_once
  File "selectors.py", line 323, in select
  File "selectors.py", line 314, in _select
OSError: [WinError...

Only a single user has been affected so far.

sentry-for-tribler[bot] commented 1 year ago

Sentry issue: TRIBLER-12M

drew2a commented 1 year ago

@kozlovsky this bug seems like from the same scope as the bugs you are fixing now..

kozlovsky commented 1 year ago

@drew2a: For me, it looks like a different bug. This is my current understanding of the error:

  1. The problem does not initiate in the 7.13 release; according to Sentry, it was already in 7.12.1 at least.

  2. Currently, only one user is affected by this problem, according to Sentry, but he is affected pretty seriously, as we have eleven reports from the same user.

  3. It happens inside the run_tribler_core_session function in the loop.run_until_complete(core_session(...)) line, and at the moment of the error, the asyncio code performs event_list = self._selector.select(timeout). The actual Python code executed at the moment of the error is:

        if sys.platform == 'win32':
            def _select(self, r, w, _, timeout=None):
                r, w, x = select.select(r, w, w, timeout)
                return r, w + x, []
    
        def select(self, timeout=None):
            timeout = None if timeout is None else max(timeout, 0)
            ready = []
            try:
                r, w, _ = self._select(self._readers, self._writers, [], timeout)  # <-- here
                ...
  4. The error description, according to MSDN, is:

    WSAEFAULT 10014 Bad address. The system detected an invalid pointer address in attempting to use a pointer argument of a call. This error occurs if an application passes an invalid pointer value, or if the length of the buffer is too small. For instance, if the length of an argument, which is a sockaddr structure, is smaller than the sizeof(sockaddr).

    I doubt that the reason for the error in our case is an invalid pointer value. The second possible reason that the length of some buffer is too small looks more probable to me.

  5. I found several projects with a similar error; this is how the problem was solved in one project:

    The function getaddrinfo("localhost", portStr, NULL, &ainfo) used that way was returning an IPv6 address. While accept was getting sockaddr_in, which is a struct for IPv4 address. It could be probably solved more ways, for example

    • using sockaddr_in6 for IPv6 communication
    • telling getaddrinfo to to search only IPv4 results with 3rd argument
    • picking up next result in the linked list returned by getaddrinfo

    I chose to manualy init the socket for IPv4 protocol

    I another Rust-based project the problem described in the following way:

    Repo on Windows 10. Seems to crash on OS X as well.

    The localhost is resolved to the IPv6 address [::1] and then the IPv4 address 127.0.0.1, and UdpSocket will only bind to the first address if successful. So the UDP socket is bound to an IPv6 address, and can only send data to IPv6 hosts.

    Meanwhile, example.com is first resolved to the IPv4 address 93.184.216.34 and then the IPv6 one [2606:2800:220:1:248:1893:25c8:1946] (if the OS supports that). The implementation of send_to just send the data to the first address, regardless of if the result is Ok or Err.

    The 10014 is due to sending data from IPv6 socket to IPv4 host.

    So there are two problems shown from this issue:

    1. maybe send_to should use each_addr, or at least filter the address family?
    2. need to have some way to prioritize IPv4 address for localhost when using ToSocketAddrs. (IPv6 support is not good on internet, even if send_to picked [2606:...] it could still fail with 10049)

Probably we have the same reason for the error, and sometimes are sending data from an IPv6 socket to an IPv4 host?

github-actions[bot] commented 3 months ago

This issue has not seen activity for 60 days. It is now marked as stale. Please provide additional information or this issue may be closed in the future. We value your contribution and would love to hear more!