Closed colin1497 closed 6 years ago
@synctext I guess this should be assigned to whichever student is in charge of improving the tunnel community?
OK, yes. Another nice student job.
So the tunnel community has scaling problems. Good to hear from our users about this. With a nice performance graph, it's even performance analysis & re-factoring.
@Pathemeous #21 Seems the 1TByte seeding goal is difficult. Seeding anonymously gives already errors at >60 torrents.
Yes, this should be the first target to overcome (as expected). It seems that this is related to the big GUI refactor? Without a clean API such high-performance goals are bound to result in errors like these.
It's not related to it, but in principle we where planning to have it fixed for the wx3 release.
Depending on how long the tunnel community refactor will take, it could be that the improved tunnel community is not ready until after the new gui is ready, so maybe it's not worth worrying about breaking the gui parts of it for now.
If the WX3 part of this milestone is ready before the rest, we can still release that so we get in Debian/Ubuntu ASAP and split the rest for a future milestone/release.
Starting to target this one as it's the last issue assigned to me for 6.6. What do I have to work with? @colin1497 I see now that it's been open a while, what do you remember; do you have any stacktrace or log related to this issue?
My first guess is that downloading or seeding 60 anon torrents means creating >= 60 hidden tunnels which is quite heavy on the cpu side. Having (blocking) python calls for 60 tunnels probably results in e.g. diffie hellman handshakes or intro-points timing out, which may be what is going on here. @whirm @synctext do you think this is a probable cause? Looking at the code there are many possible timeouts.
I'm afraid it's been a while. Looking back at #1605 I originally had 91 torrents, so it was well over 60. I haven't seen this is a long time. Besides build changes, I've also tripled my data rates with my ISP since I originally created the report. If your speculation is right, then you would hit it at some point. Maybe the issue is just that it shouldn't start all the tunnels in parallel? Maybe it should queue them up and start a max of 10 in parallel or something? Just thinking out loud.
Even if some requests time out it should still keep building circuits until it hits the circuit target.
@whirm sure, but if due to a large amount of circuits being built not a single one can actually be constructed, rescheduling them all concurrently will mean that all the newly scheduled circuits will timeout as well. Assuming that this is the issue of course.
@lfdversluis let's stop guessing and try to reproduce it instead. Once you've got a scenario where this happens.
If you don't have a shitty Internet connection, use wondershaper
to fake it :)
If you want to limit the amount of cores Tribler can use (this shouldn't make a huge difference) you can use taskset
.
FYI, just updated to 6.6, 7cd6ed7402d22772e4a09c4520bec1b8553e1fc8 and it fails to build circuits. 103 seeded torrents.
Just looking at the Windows resource monitor it doesn't appear to be CPU bound. Resource monitor shows lots of disc activity on the mechanical drive where torrents are stored (60-100MB/s). Network activity rate is relatively low, well under 1Mbps.
On exit, I get a log file each time. I have diffed a couple of the log files and they are basically the same:
Edit: After deleting my old tribler.conf file, it successfully built circuits and is now checking every one of the 103 torrents.
Edit2: Comparing the tribler.conf files, aside from the old one having some old options like t4t*, the big difference appears to be the "user_download_choice =" option with all the torrent hashes with "restartseed". I'm going to let it finish all these checks, shut down, and see what happens.
@colin1496 We still did not have a look at this, sorry. The credit rewards for seeding + credit mining have been our prime focus since Feb.. Once that is done, the anon tunnels will get full attention.
No problem, just trying to give as much info as possible. After checks were completed I restarted and again no joy with almost an identical log file.
@colin1497 Thank you that is very valuable info. It seems that the IO is too heavy and probably completely blocking the twisted thread, most likely resulting in circuits timing out due to handshake failures and what not. I am currently in the process of making the IO non-blocking by pushing it out of the twisted thread in https://github.com/Tribler/dispersy/pull/481 but this migrating is still underway. After dispersy, Tribler is next including the tunnels.
Hmm looking at the log file I see ImportError: No module named csv
which should be shipped with Tribler.
File "twisted\internet\base.pyo", line 825, in runUntilCurrent
File "Tribler\Core\APIImplementation\LaunchManyCore.pyo", line 486, in session_getstate_usercallback_target
File "Tribler\Main\tribler_main.pyo", line 498, in sesscb_states_callback
File "Tribler\Main\Dialogs\systray.pyo", line 40, in updateTooltip
exceptions.AttributeError: 'ABCTaskBarIcon' object has no attribute 'icon'
is wx related, we are moving to QT soon so that should be fixed soon.
File "twisted\internet\defer.pyo", line 150, in maybeDeferred
File "Tribler\Core\Modules\versioncheck_manager.pyo", line 54, in check_new_version
File "twisted\web\client.pyo", line 1594, in request
File "twisted\web\client.pyo", line 1578, in _getEndpoint
File "twisted\web\client.pyo", line 1454, in endpointForURI
File "twisted\web\client.pyo", line 818, in raiseNotImplemented
exceptions.NotImplementedError: SSL support unavailable
means our version manager is broken? @devos50 what do you make of this?
I am relatively certain that I didn't get the log entries in the session where I deleted tribler.conf and it rechecked every file. I think that it's only happening when it never is able to build the circuits.
Edit: No - seems a clean install just starting tribler fdfd8db9ccccc1229bdf1be2b0908664f57613ad gives this log:
@colin1497 you need to install python-openssl
@colin1497 if you are running from git, you should install all the dependencies listed on debian/control
Downloading Windows installer builds from Jenkins. I shouldn't have to separately install dependencies in that scenario, should I?
ah, the unchecked latest Windows builds. Fresh from Jenkins.Tribler.org then?
These are not often checked if they function OK. It would be good to check if this bleeding edge code, freshly installed can seed just one swarm correctly in Anon mode.
@colin1497 The devel branch is almost exclusively used by developers that are adding additional dependencies (e.g. I am adding several at the moment). So often we add dependencies on our machines before we add them to the builders to check everything is working. The builders then ship these with the installers :)
As @synctext said, there are not regular checks on devel. Our next branch is far more stable, but we do not have any guarantees on this either. The only guarantee we do strive to deliver is that all dependencies are shipped with our installers (naturally). But if something is not working, do let us know so we can add it to our todo list.
Apologies guys, I had been pulling next branch builds previously, and had an issue in 6.5.2 and went to jenkins to grab latest build to see if same issue still existed. Geez, I can see that I clearly ended up grabbing devel branch versions. /facepalm
@colin1497 heh, no worries, at least we know it needs to be fixed now :)
@lfdversluis maybe this is due to the MSVC rebuild you did? Maybe you forgot something onthe python-openssl dll chain.
Quick update since there was concern about CPU performance:
I tried a few things. I watched CPU usage and it didn't seem that high, not even enough to force the CPU to peg to its max frequency. I set up an idle priority, 100% usage application and pegged it to one core to force the CPU frequency high. I set Tribler to "realtime" priority level. No change in behavior.
I can get 20 connected peers, but can't build circuits for my seeds.
Looking at network usage, it's really not that high -- never goes over 1Mbps. I have Gb infrastructure and 50Mbps connection to the internet.
Obviously that's all macro level.
@colin1497 You discovered a problem in the tunnel community. The team made a good performance measurement test. Even with light load the tunnel may take 3 minutes to build.
btw Gb infrastructure, nice!
Good to hear I found a legit issue.
WRT infrastructure, we completely renovated a house last year and it's relatively ridiculous what all I did....
Just wanted to say that that this remains a problem in the 7.0.2 release. I hadn't had an issue with it because:
1) I hadn't been in Tribler that much, and 2) At some point I lost my database and started clean with a lot fewer torrents, but
I'm up to the point where at startup basically everything just spins its wheels saying it's building circuits but none ever get going.
I'm assuming this to be fixed, but I'll add it to the 7.2 milestone for verification.
I'm pretty sure this issue has been fixed. Closing the issue. Please let me know if there are any other problems related to circuit building.
Seeding >60 torrents anon, tribler sometimes fails to build circuits and seed. Lots of info in #1605, see this comment specifically:
https://github.com/Tribler/tribler/issues/1605#issuecomment-142086993
Splitting this issue out for tracking purposes, may be related to #1682