Closed ichorid closed 5 years ago
Monitor-bezel-crushing image of Tribler module imports.
import is_valid_address
and import call_on_reactor_thread
respectively.Tribler.community.market
is in the center. It shows some signs of hierarchy.Zoomed in. Dispersy:
Market:
IPv8:
(p.s. used pyreverse
from pylint
to generate these. Had to remove all decorators, since it can't handle different decorators with the same name, or when a decorator has the same name as the function.)
It seems RemoteTorrentHandler
contains completely dead code. Like, it's not used anywhere, or never ran. And when you break it, nothing happens, Tribler works OK. It seems to only write torrents DB, and never read it.
Guys, could someone comment on this issue?
It should be used here in the SearchCommunity
:
https://github.com/Tribler/tribler/blob/b096b9bf48a3ac815c104408f84259b639ac7014/Tribler/community/search/community.py#L98
You should be able to trigger this through searching for a torrent.
The only place where something _rtorrent_handler
is used is:
https://github.com/Tribler/tribler/blob/b096b9bf48a3ac815c104408f84259b639ac7014/Tribler/community/search/community.py#L526
But it is never run!
https://github.com/Tribler/tribler/blob/b096b9bf48a3ac815c104408f84259b639ac7014/Tribler/community/search/community.py#L516
is always empty. You could literally comment out .download_torrent
, and Tribler GUI works as usual. Channels, search, anonymous downloads, mining - everything works.
So, what the rtorrent_handler
subsystem is intended to do?
Basically, message.candidate
is always empty here:
https://github.com/Tribler/tribler/blob/b096b9bf48a3ac815c104408f84259b639ac7014/Tribler/community/search/community.py#L509
@egbertbouman can you say anything about this?
It should collect 50k of torrents slowly in the background. The TFTP binary transfer of torrents never worked smoothly. You found another skeleton in our collection... We have an open issue for magnet links only Tribler operation. Discussion started 3 years ago #1150
I can probably answer your question about the remote torrent handler module.
Another purpose of the module, besides what @synctext says, is to speed up the process of fetching torrent metadata. In particular, when clicking on a torrent in Tribler (i.e. a search result or a torrent inside a channel), it would try to fetch the torrent from other candidates. The torrent file itself would be sent using the TFTP protocol and the data was sent over the Dispersy endpoint (since this endpoint was already NAT punctured by Dispersy). I even think there is a separate prefix for this.
It used to be this way in the old GUI. When I built the new GUI, I added support for fetching torrent metadata using HTTP/UDP trackers and the DHT. Since (re)adding support for the TFTP transfer of torrent metadata was not very high on my priority list, this feature was never added back to the GUI. The code still exists in the original code base, together with a few tests (which are not really unit tests...). I highly doubt whether we want to re-add this feature again. For now, I would be fine if you remove the TFTP module and code that manages fetching torrents from others. Let's discuss this next week 👍
@devos50 The RemoteTorrentHandler used to do a lot more than just TFTP torrent collecting, it's also capable of collecting from magnet/DHT/candidates. All this functionality is still in there, but I guess we stopped using this while creating the new GUI.
As far as TFTP goes, I don't think this has ever worked properly. If you look at the code you'll see the that it will only try to collect torrents that we already have :cry: This means that collect-requests are still coming in, but we never actually discover a new torrent.
I talked to everyone involved, and it seems the consensus is: "we used this stuff in the old days, but now its irrelevant, so it should go". So, its removal is on the list.
Well, before it is deleted... We need to know the consequences. No full .torrent collecting anymore? Have measurements proven that magnet-based BEP9 collecting is now fast enough? Is there a design for anonymous downloads and getting the .torrent?
@synctext, I've just ran Tribler with this stuff completely disabled, and it worked fine. Search, channels, downloads - everything ran fine. Torrent collecting is redundant now.
@synctext I looked into the TFTP torrent collecting, and at the moment it's not working at all. That means that we are only collecting torrents from the DHT/trackers after some user interaction (e.g., downloading a torrent). For testing purposes, I did a 1-line fix, and Tribler magically started collecting TFTP torrents. So, if we are OK with the amount of torrents we are collecting right now, we could remove it.
Note that because we are not currently collecting torrents in the background, we are essentially gossiping all our downloads within the SearchCommunity...
Thank you for finding this out! Quality of search result has been going down in Tribler, in my direct experience. That explains it, the torrent collected has gone down. How many torrents do people have on average? How many people have the healthy 50k torrents? Every Tribler client used to broadcast these health stats. How many dead torrents in the cache? We need a single dev for the content health I believe.
@synctext, we're gonna scrape all of this in favor of AllChannel2.0, the Heir to Dispersy. We definitely want to have some mechanism for background torrent metadata exchange, at least for anonymity purposes. However, there is no point in reclaiming this old code. For "Dispersy v1.9 refactored" release we could do away with the simple mechanism of returning more than one torrent on search/update request. That would be just enough to support anonymity and quality of service.
We first need a spam solution, protection against an attack in AllChannels mechanisms with merely $1000 worth of cloud servers. With solid foundation we can focus on fancy features. Please focus on linking Trustchain non-cheap identities to content discovery. Perfect if we can all do that in 1 upgrade, but I prefer to first remove the SyncAll broadcast in AllChannel community, upgrade to IPv8, and discard all channels votes with Trustchain > 2 hops (and a future Max-Edge-Flow > 1 GByte).
That is a lot of features to get flawlessly deployed, correct?
Trustchain non-cheap identities to content discovery. Perfect if we can all do that in 1 upgrade, but I prefer to first remove the SyncAll broadcast in AllChannel community, upgrade to IPv8, and discard all channels votes with Trustchain > 2 hops (and a future Max-Edge-Flow > 1 GByte).
- that is exactly what I meant by "Dispersy v1.9 refactored"! To put to use the years of Tribler experience, we have to keep the minimal core of successful code/algorithms from AllChannel/Dispersy. To remove obstacles to future developments, we have to port it to IPv8. To make it secure enough for MarketCommunity, we have to add Trustchain linking. No one talks about fancy features ))
Each object in our codebase has its own SQL access methods... Maybe we should discuss using ORM?
It would greatly simplify and unify all SQL-related stuff.
Yes, ORM is also on my 2019 wish list.
Our running old-IPv8 ORM code: http://svn.tribler.org/abc/branches/giorgos/datasync/sqlalchemy/schema.py
Please take the time to read about our operational 2010 work with this code: https://repository.tudelft.nl/islandora/object/uuid:52b586ea-6144-4b4e-a5a1-b05255ce493a
Plus IPv8 in 2010 with ORM: http://kayapo.tribler.org/trac/wiki/IPv8
There is a LOT of valuable experience in this code, so refactoring should be done carefully.
Impressive how fast you comprehend our code base.
@synctext, I have to, to be able to keep up with the pace of development...
IPv8 issues: Boilerplate code:
auth = BinMemberAuthenticationPayload(self.my_peer.public_key.key_to_bin()).to_pack_list()
dist = GlobalTimeDistributionPayload(self.claim_global_time()).to_pack_list()
payload = MyMessage(self.BLABLABLA).to_pack_list()
is a classic example of a boilerplate code that is used in every message in every community.
@qstokkink , is there some reason why this is not moved to IPv8 _ez_pack
, and the programmer have to do it by hand every time?
@ichorid the GlobalTimeDistributionPayload
is deprecated and I want everyone making a message to be discouraged from putting it in there, by forcing them to add this extra line.
@qstokkink, but it is in your wiki tutorial... What should we use instead of it?
@ichorid good point, I'll remove that.
I've done some quick check for duplicates in Triber with Clonedigger. Market community is the first victim:
Source files: 33
Click here to show/hide file names
Clones detected: 71
753 of 3066 lines are duplicates (24.56%)
Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False
(p.s. the analysis does not include unit tests.)
@devos50 , I guess this kind of analysis could be useful for your efforts with Market.
Looks like IPv8 has some duplicates too:
Source files: 8
Clones detected: 138
872 of 6993 lines are duplicates (12.47%)
Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False
Bear in mind, though, that Clonedigger's algorithm is language-agnostic and prone to false positives. So, take these stats with a proper grain of salt.
For the whole Tribler/next, without Dispersy and Test directories, and without submodules:
Source files: 272
Clones detected: 2822
4892 of 30018 lines are duplicates (16.30%)
Parameters
clustering_threshold = 10
distance_threshold = 5
size_threshold = 5
hashing_depth = 1
clusterize_using_hash = False
clusterize_using_dcup = False
Anyone willing to continue this work - welcome! Closing it for now.
After a year of refactoring, Dispersy is dead and Market is moved to a submodule. Here are the updated diagrams:
This is the big list of things I found out during the review.
TODO:
run_tribler.py: __main__
runs twice, refactor in 2-3 separate files.LaunchManyCore
should be completely refactored by splitting into the series of smaller objects/methods. Its torrent-handling functionality should be moved to the torrent manager.APIImplementation
dir should be scrapped completely after refactoringLaunchManyCore
.RemoteTorrentHandler.py
contains almost completely dead code. EDIT: it used to collect torrents in the background for anonymity and quality of service, but was broken a long time ago. Not gonna fix, should just redo this in AllChannel2.0.SqliteCacheDBHandler.py
is full of awful old code that should basically be completely gone when we move to AllChannel2.0. For example,PeerDB
is only written, and is never read. We should keep SQL usage to a minimum to be as stateless as possible.Notifier.py
- subjects should not be pre-set in Notifier. Instead, they should be defined dynamically by users of Notifier object, in their respective modules.TorrentDef.py
- includes dumb TorrentDefNoMetainfo object that forces everyone to check the object type. Should be refactored to become one with TorrentDef. Which should be moved into TorrentMgr module.TriblerStatistics
object should be used as a parent class for Session, instead of being wrapped around.