Open SilentBot1 opened 1 year ago
To add some further details to this, after testing this within client_test
included in the examples folder, with the --alert_mask=tracker,torrent_log
enabled, it appears that after the client becomes unresponsive to the tracker, the following alerts are fired when an offer is received from the tracker:
*** WEBSOCKET_TRACKER_READ
*** RTC signaling processing remote offer
*** RTC signaling creating connection
But there are no further *** RTC signaling generated answer
or *** WEBSOCKET_TRACKER_WRITE
alerts fired.
Compared to when the tracker is working, all five events are fired in order:
*** WEBSOCKET_TRACKER_READ
*** RTC signaling processing remote offer
*** RTC signaling creating connection
*** RTC signaling generated answer
*** WEBSOCKET_TRACKER_WRITE
Based on this, it appears to go astray between rtc_signaling::process_offer
and rtc_signaling::create_connection
, before rtc_signaling::on_generated_answer
is called.
Sorry to directly ping, but is this something you would possibly know about / be able to assist with @paullouisageneau?
Thank you for the very detailed report!
I think what's happening here is that libdatachannel encounters an error while creating the connection and generating the answer, but it's silently ignored here. Possible reasons for such a failure are running out of file descriptors or running out of usable UDP ports. Enabling DEBUG_RTC
here might give information about what's happening under the hood.
The WebTorrent connection process is extremely greedy in terms of UDP sockets as each offer requires one, therefore each batch creates 10 of them. There might be an issue with cleaning up failed connections resulting in a file descriptor leak or just too many file descriptors (the default limit on Linux is 1024). I'll look into it as soon as I get a moment.
No worries at all and thank you for looking into this when you have time.
After running the same test as above, now with DEBUG_RTC
set to 1, and with the flags --alert_mask=tracker,torrent_log,session_log
, I can now see the following debug logs after the issue starts occurring, when an announce comes in after the client stops responding:
*** RTC signaling processing remote offer
*** RTC signaling creating connection
libdatachannel: debug rtc::impl::Certificate::Generate@213: Generating certificate (OpenSSL)
libdatachannel: debug rtc::impl::IceTransport::IceTransport@49: Initializing ICE transport (libjuice)
libdatachannel: info rtc::impl::IceTransport::IceTransport@104: Using STUN server "stun.l.google.com:19302"
libdatachannel: info rtc::impl::PeerConnection::changeSignalingState@1166: Changed signaling state to have-remote-offer
libdatachannel: debug rtc::impl::PeerConnection::processLocalDescription::<lambda_7d5d78206bf58d91eaee80ae3f35c4c1>::operator ()@837: Reciprocating application in local description, mid="0"
libdatachannel: info rtc::impl::PeerConnection::changeSignalingState@1166: Changed signaling state to stable
libdatachannel: info rtc::impl::PeerConnection::changeGatheringState@1153: Changed gathering state to in-progress
libdatachannel: info rtc::impl::IceTransport::LogCallback@342: juice: agent.c:1012: Changing state to gathering
libdatachannel: info rtc::impl::IceTransport::LogCallback@342: juice: agent.c:1012: Changing state to connecting
libdatachannel: info rtc::impl::PeerConnection::changeState@1135: Changed state to connecting
libdatachannel: info rtc::impl::IceTransport::LogCallback@342: juice: agent.c:399: Using STUN server stun.l.google.com:19302
This appears to be missing the further steps, which are logged in a successful offer generation, which are:
libdatachannel: info rtc::impl::IceTransport::LogCallback@342: juice: agent.c:1298: Got STUN mapped address REDACTED:57539 from server
libdatachannel: info rtc::impl::IceTransport::LogCallback@342: juice: agent.c:2277: Candidate gathering done
libdatachannel: info rtc::impl::PeerConnection::changeGatheringState@1153: Changed gathering state to complete
*** RTC signaling generated answer
I'll try and see if I can do a bit of further digging.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
stale, no!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'll look into it.
Please provide the following information
libtorrent version (or branch): master@https://github.com/arvidn/libtorrent/commit/cbc4352f30c762740c53b74fef732d756aa313cb
Platform/architecture: Windows 11 x64, Ubuntu 22.04 x64
Compiler and compiler version: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Ubuntu 22.04 x64 Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30145 for x64 on Windows 11 x64
Please describe what symptom you see, what you would expect to see instead and how to reproduce it:
When using a compiled version of libtorrent 2.1.0.0 (libtorrent@cbc435) compiled for qBittorrent v4.6.0apha1, in conjunction with bittorrent-tracker (bittorrent-tracker@0b581e) and aquatic-ws (aquatic-ws@5ac8f3), I have noticed that when there are multiple announces at the same time via a websocket tracker, some instances of the tracker get stuck in the updating state.
This issue can be observed if there is a libtorrent client connected to a websocket tracker, when multiple announces are performed at the same time, some torrents have their instance of the websocket tracker stuck in the updating state and no longer provide further announcements, but also never transition out of the updating state without removing/re-adding the tracker, or restarting the client.
This effect gets procedurally worse over time, until all instances are stuck in the updating state, with only websocket ping requests being sent back and fourth between the tracker and client.
Via a packet capture, I have confirmed that all announces that are sent to the websocket tracker have a valid response returned, though some of these still get stuck in the updating state.
In my testing with 10 new, unique, torrents, going to only a local websocket tracker with a 60 second announce interval, my results were:
As a side note to this, after only having a single torrent, this appears to have died after 48 announces / 48 minutes / 480 offers.
Steps to reproduce:
-Dwebtorrent
flag enabled.DEBUG=*
environment variable set.If any further information is needed to help in the debugging of this, please don't hesitate to let me know.