caisan / libtorrent

Automatically exported from code.google.com/p/libtorrent
Other
0 stars 0 forks source link

DHT node count drops to 0 #368

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Start either Deluge or Qbitorrent client.
2. Wait.
3. Watch the nodes drop to zero and stay there.

What is the expected output? What do you see instead?

I'd much prefer to remain connected to at least enough nodes to leave DHT 
functional.

What version of the product are you using? On what operating system?

0.15.10 in Deluge 1.3.5
0.16.3 in Qbitorrent 3.0.0
Windows 7, both x86 and x64 versions (on 2 seperate computers).

Please provide any additional information below.

I have noticed this same problem with 2 different Bittorrent clients that are 
built around libtorrent.  Deluge (current version 1.3.5 uses libtorrent 
0.15.10) and Qbittorrent (current version 3.0.0 uses libtorrent 0.16.3) both 
loose connection to the DHT network.  Everything starts fine, and I am able to 
connect to a couple hundred nodes.  After either client has been running for a 
short time, the connected nodes count starts to gradually drop.  Within a 
couple of hours the connected nodes count is 0 and DHT (obviously) no longer 
functions.  The node count doesn't increase again until the client is 
restarted.  Both clients have issue tickets open for this, but it seems like 
the core problem may be with libtorrent.  Frustratingly, this doesn't seem to 
affect too many people - or at least not many that will bother reporting it on 
the bugtrackers - so it may be difficult to reproduce.  Please advise if I can 
provide any further info or assistance/testing.

Original issue reported on code.google.com by nertberble on 10 Sep 2012 at 8:17

GoogleCodeExporter commented 8 years ago
I'm not sure, but this may be related to this issue: 
http://code.google.com/p/libtorrent/issues/detail?id=365

Original comment by hell...@gmail.com on 11 Sep 2012 at 2:38

GoogleCodeExporter commented 8 years ago
Hmmm... doesn't really sound like my issue.  Only my DHT node count drops, I 
have no problem keeping seeds (so far at least).  Also, my behaviour is not 
dependant on how many torrents are running - it happens after a couple hours of 
starting the client even if no torrents are even loaded, let alone running.  I 
generally try not to have 300+ torrents running at a time, let alone 1000 like 
in issue 365.  Thanks for trying, though.

Original comment by nertberble on 11 Sep 2012 at 10:19

GoogleCodeExporter commented 8 years ago
If you're up for rebuilding libtorrent, there's an option to enable DHT logging.

define TORRENT_DHT_VERBOSE_LOGGING

It will produce a few DHT log files in current working directory, primarily 
dht.log

Original comment by arvid.no...@gmail.com on 11 Sep 2012 at 2:59

GoogleCodeExporter commented 8 years ago
Well, that's certainly not impossible, but is not ideal.  Firstly, I'm using 
Windows, which is absolutely not a developer platform.  I have had little 
enough success through the years compiling from scratch under Linux, which is 
far more suited to the task than is Windows.  I wouldn't even know where to 
start to build from source under windows.  Secondly, how would I get the 
clients to use this newly built version of libtorrent rather than the versions 
they ship with?  Wouldn't I have to rebuild them from scratch as well?  
Needless to say, programming isn't my area of expertise.  I'm willing to give 
it a try, but would need a lot of assistance with the process, I'd imagine.  
And why is this option not already activated?  I could see building in a config 
setting to suppress this logging if desired, but it really should be on by 
default.  It's logging, after all - absolutely essential for product support.

Original comment by nertberble on 12 Sep 2012 at 2:26

GoogleCodeExporter commented 8 years ago
I can confirm this problem.

I've tested following versions on Windows 7 (same computer), all are affected:
qBittorrent 3.0.0 with libtorrent 0.16.2.0 (official build)
qBittorrent 3.0.2 with libtorrent 0.16.3.0 (official build)
qBittorrent 3.0.4 with libtorrent 0.16.3.0 (compiled with 
TORRENT_VERBOSE_LOGGING and TORRENT_DHT_VERBOSE_LOGGING)

From the last one I've got libtorrent logs. I took two runs of logs.
First qBittorrent started without any torrents. DHT connects, nodes go up for a 
while and after less than half an hour they've dropped to zero. Then, at around 
32 minutes I added a trackerless DHT only magnet link. After around 45 minutes 
of stalled torrent and 0 DHT nodes, I just shut down qBittorrent, saved and 
cleared the logs.

Second time qBittorrent is started again with the previously stalled torrent 
still active. The torrent starts downloading quickly as expected, but after a 
while DHT nodes start going down again and eventually reach zero again. The 
torrent keeps going nevertheless and is eventually finished.

At this point when I stopped the torrent and added another torrentless magnet 
link, the added torrent remains stalled and DHT node count at zero. If I resume 
the other torrent, it starts seeding slowly, but DHT node count remains at 
zero. After waiting for a while, no activity in the added torrent, still 0 DHT 
nodes. I shut down qBittorrent.

I'm attaching the first run log here, the second one is too large but if it's 
needed I can upload it somewhere.

Original comment by jfrob...@gmail.com on 22 Sep 2012 at 12:52

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks a taking the time to recompile and provide logs jfroberg.

@arvid: A lot of qBittorrent users are reporting similar issues on Windows. I 
haven't been able to reproduce on Linux or Mac OS X. From what I gathered, it 
may be related to magnet links.

Original comment by dch...@gmail.com on 23 Sep 2012 at 3:12

GoogleCodeExporter commented 8 years ago
The logs suggests that nobody responds to DHT requests, starting at 1 minute 
and 46 seconds. Not even the router nodes respond, which indicate that the DHT 
messages probably get dropped somewhere.

This is a windows only problem?

I would be interesting to see if it works with the DHT rate limit set to 
unlimited (default is 4 kB/s).

Original comment by arvid.no...@gmail.com on 24 Sep 2012 at 2:32

GoogleCodeExporter commented 8 years ago
I have tested qBittorrent on another computer running Gentoo on the same 
network, I kept it running for few days and DHT node count was stable for whole 
time.

Also, on the same Windows computer I've tried other clients that support DHT 
without similar problem, so there shouldn't be anything amiss with the network 
or the computer itself.

So for my it looks like the problem exists only with Windows build.

I tried looking for DHT rate limit setting from qBittorrent, apparently it's 
either not supported or not configurable (at least directly). After some quick 
grepping on the libtorrent source, I tried changing dht_upload_rate_limit(4000) 
value in session.cpp to 4000000 to see if it made any difference. I rebuilt 
libtorrent and qBittorrent, and tried to repeat the earlier scenario in the 
first_run.zip. Same result, I've attached the log just in case.

Apparently my tweak wasn't enough or correct at all, or it qBittorrent does 
override it at some point.

Original comment by jfrob...@gmail.com on 24 Sep 2012 at 1:56

Attachments:

GoogleCodeExporter commented 8 years ago
I've tried to reproduce this on windows (XP) without success. It could be 
specific to vista+ or win7+.

My theory right now is that there's some error that happens on the UDP socket, 
causing libtorrent to stop listening on it. There is currently a list of errors 
that are considered non-fatal, from which libtorrent will keep reading, but in 
this case my guess is that there is a non-fatal error which is interpreted as 
fatal.

There should be an alert posted when there is a UDP socket error. I will make 
it log to the session log as well.

Original comment by arvid.no...@gmail.com on 25 Sep 2012 at 2:22

GoogleCodeExporter commented 8 years ago
jfroberg, any chance you could apply this patch and try again?

Index: src/session_impl.cpp
===================================================================
--- src/session_impl.cpp        (revision 7452)
+++ src/session_impl.cpp        (working copy)
@@ -2478,6 +2478,12 @@
                        if (e != asio::error::operation_aborted
                                && m_alerts.should_post<udp_error_alert>())
                                m_alerts.post_alert(udp_error_alert(ep, e));
+
+#if defined TORRENT_VERBOSE_LOGGING || defined TORRENT_LOGGING || defined 
TORRENT_ERROR_LOGGING
+                       char msg[200];
+                       snprintf(msg, sizeof(msg), "UDP socket error: (%d) %s", 
e.value(), e.message().c_str());
+                       (*m_logger) << msg << "\n";
+#endif
                        return;
                }

This will log every udp socket error as well as posting the alert.

Original comment by arvid.no...@gmail.com on 25 Sep 2012 at 2:40

GoogleCodeExporter commented 8 years ago
All right, I removed the earlier hack I tried and applied the patch you 
provided.
Here's the log, same drill as in previous ones.

Original comment by jfrob...@gmail.com on 25 Sep 2012 at 3:12

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you very much! As it turns out, windows has multiple different error 
codes for the same errors. I was looking for WSAE* errors, but the same errors 
also exist as ERROR_* errors (different error codes).

The reason why this was a problem was because the udp-socket implementation in 
libtorrent is conservative, and treats any unknown error as fatal, to avoid 
getting into a busy loop of failures. Only errors that are known to not be 
fatal cause a retry in reading from the socket.

I've checked in a fix to RC_0_16 and trunk. I'm attaching here for completeness.

If you feel like trying it out, please let me know if there are any problems. I 
feel pretty confident though.

Original comment by arvid.no...@gmail.com on 25 Sep 2012 at 6:49

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you Arvid for the correction you provided.

However, I'm afraid that this might not be over yet. I applied the patch you 
and recompiled both libtorrent and qBittorrent. I have made five test attempts 
like the earlier one, ie. starting qBittorrent without any torrents and let it 
run for a while. First attempt failed, the DHT nodes disappeared again as they 
did before patching, it just took a while longer this time. I was too perplexed 
by this to start a torrent before I shut down qBittorrent but I did save the 
logs though.

Next three attempts were successful in that DHT appeared to be fully functional 
and number of DHT nodes stable for quite a while. Fifth run failed as earlier, 
but the logs were enormous due to qBittorrent running over night and well into 
afternoon, so I didn't save those. I did try to start a trackerless torrent, it 
did not start.

I'm including the logs of the first failed attempt, they are a bit longish as 
well as I had some other business to attend to and had to leave the program 
running for a longer time.

I'll keep trying to make the problem appear again and obtain a better log that 
includes loading a torrent as well.

Original comment by jfrob...@gmail.com on 27 Sep 2012 at 1:05

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for helping out with this. Really, the log lines I'm interested in are 
the ones of this form:

UDP socket error: (<code>) <message>

In this case, the last such error was:

UDP socket error: (10052) The connection has been broken due to keep-alive 
activity detecting a failure while the operation was in progres

I've added this error code as non-fatal as well. If you make another 
overnight-run, you could just grep for "UDP socket error" in the logs.

Original comment by arvid.no...@gmail.com on 27 Sep 2012 at 5:39

GoogleCodeExporter commented 8 years ago
Came here via Google after experiencing this same issue with 1.3.5 on Windows 7 
x64. Running the daemon and connecting the client to that. After running the 
daemon for several days DHT nodes were 0 and would not budge from that. 
Restarting the daemon immediately connected to 27 nodes.

Original comment by a...@zweimiller.com on 30 Apr 2013 at 8:20

GoogleCodeExporter commented 8 years ago
I have (for some time) the same issue running on Linux: 
  3.11-2-686-pae i686 
  qBittorrent 3.1.3
  Libtorrent 0.16.11.0

After several hours of running fine I find the DHT node count at zero and 
torrents are therefore stuck (mostly seeding, but any new torrents added during 
this time usually do not manage to download; those that have already found some 
peers during the time DHT was working will continue).
I have found that turning off DHT in qbittorrent settings and then turning it 
back on again will start it working - no need to restart qbittorrent itself. 
But after this it will inevitably drop to zero at some point.

One point that might make my setup different: I use a SOCKS5 proxy, including 
for peer connections.

Original comment by a...@meiri.org on 29 Jan 2014 at 7:33