Open drew2a opened 8 months ago
As a starting point for discussion, I propose the following algorithm:
Popularity Community operates in this manner:
(Note: Steps 1, 2, and 3 remain unchanged from the current algorithm)
@dataclass
class RequestKnowledgeMessage:
infohash: str
My idea is to first focus on stability, removing Gigachannels, keeping tags, and then radically alter the architecture. Lets not try to fix things which are not broken currently :thinking:
- Remove 2 out of 3 methods for content discovery - promote PopularityCommunity as the only way to discover novel hashes - remove channel sampling/pre-view and free-for-all channel mechanism - Remove in GUI and core at some time or hide - Release a stable release with this code - Add a new message inside the PopularityCommunity - backwards compatible with older peers - new feature of shadow keys and Libtorrent ground truth on swarm size - ```Query, swarm-clicked, swarm-not-clicked, swarm-clicked-size-as-seen-by-Libtorrent, date, shadow-signature``` - crawl new info - Web-of-trust: also the rendezvous will get start producing limited crawl data - New privacy-protected ClickLog-based discovery - New release which starts to utilise the new "ContentDiscovery" community and one-struct-to-rule-them-all - Further releases (mixing 4 things all into 1 Tribler hopefully :pray: ) - collecting further data for the Machine Learning Science part - collecting further data for web-of-trust - collecting further data for tag-based metadata enrichment (content,trust, and queries) - end of gigachannelsRemoving 2 out of 3 Content Discovery Methods
During my effort on Friday to eliminate channel sampling/pre-view and the free-for-all channel mechanism, I encountered some obstacles. Even though my initial attempt wasn't successful, I've gained insights into how this can be achieved and can now provide more detailed estimations.
The removal process should begin on the GUI side. This involves:
Once the GUI components are addressed:
metadata.db
part.Following these changes, a majority of the channels will be eliminated. Any remnants can either be adapted or removed in future refactoring stages.
From my current understanding, it can take 1 week to process these steps.
* Add a new message inside the PopularityCommunity * backwards compatible with older peers * new feature of shadow keys and Libtorrent ground truth on swarm size * `Query, swarm-clicked, swarm-not-clicked, swarm-clicked-size-as-seen-by-Libtorrent, date, shadow-signature`
For this little part of the master plan I have the following implementation in mind:
Once that all works the last remaining step is to update the search results to also make use of the preference relation instead of pure db-based text search.
A more detailed design (green blocks include the code to add), capturing some insights since my last post:
Changes:
UserActivityComponent
above).torrent_finished_alert
in the first version, instead of waiting for a 50%
threshold.Disclaimer: this is still before writing even a single line of code, the design may change as I implement it.
Despite the protocol's apparent simplicity,
PopularityCommunity
is quite complex as it derives logic fromRemoteQueryCommunity
: https://github.com/Tribler/tribler/blob/20fb22453c8c5349c6de4b8e679a5b590792b7df/src/tribler/core/components/popularity/community/popularity_community.py#L22-L23This inheritance was implemented in #5736
The current algorithm for metadata retrieval is as follows: https://github.com/Tribler/tribler/issues/7398#issuecomment-1721263221
https://github.com/Tribler/tribler/blob/20fb22453c8c5349c6de4b8e679a5b590792b7df/src/tribler/core/components/popularity/community/popularity_community.py#L81-L92
As
RemoteQueryCommunity
is going to be removed in8.0.0
we have to replace the algorithm for metadata retrieval.