cblgh / trustnet

a flexible and distributed system for deriving, and interacting with, computational trust
GNU Affero General Public License v3.0
132 stars 10 forks source link

Question about scaling #3

Open cristiano-belloni opened 3 years ago

cristiano-belloni commented 3 years ago

Hi, reading the blog post I saw this use case:

Filter the greater network to only show messages from the trusted network. Make it is togglable.

I'm wondering how TrustNet scales with many users. To calculate trust for one user, the whole trust network has to be loaded with that particular user as root, and I see two scenarios:

1) If the computation is centralized (i.e. made on the server), the server must re-calculate all the "personal" trust networks for each user every time the global trust network changes (i.e. one of the users changes trust score for another user or a new user is added). This would mean a lot of computations for a non-trivial network size.

2) If instead the computation is made on the client, the client has to have knowledge of the whole trust network, which is a privacy problem (user A is able to see how much user B trusts user C), and has to recalculate its trust ranking every time anything changes in the trust network - which again is a privacy problem (now user A knows when user B started to distrust user C) and a lot of messages to each users for any non-trivial network size.

Do you have any thoughts on how trustnet could scale in a medium (thousands of users) or big (millions of users) network?

cblgh commented 3 years ago

Hi @cristiano-belloni, apologies for the belated reply.

I think it is good to keep in mind the considerations TrustNet set outs to solve problems within. It is focused, primarily, on providing a solution for the problem of content moderation in peer-to-peer distributed social applications. This suggests that:

i. Edge computing is a better conceptual model than centralized cluster computing. Computations are performed on a user's local device. ii. Users have all the data they display. In the peer-to-peer networks I am involved with, such as Cabal and SSB, data streams in from peers. To display a like Alice made, I have to have that data on my machine as well. Therefor, to compute Alice's transitive trust, I also need to have knowledge of Alice's trust assignments. This drawback is touched upon in the paper's discussion.

TrustNet is also based on Appleseed, which is a local group trust metric. Appleseed usefulness is not limited to a global knowledge of all the rankings that have been made (as is the case for PageRank). Instead, you just need to have data for the area that you care about.

Regarding questions of performance, there are simple optimizations that can be done in the applications users run. From the blog:

To update the trust graph, we keep track of the latest message of each peer's log. When new messages arrive, scan them to see if there are any new trust assignments. If there are, signal that we need to reconstruct the trust graph.

Implementation-wise, one would reconstruct the graph after a timeout. The timeout is refreshed each time a new message comes in from a log. This ensures that the trust graph is not constantly being reconstructed, which could be the case if we recently reconnected to the Internet for the first time in a few weeks and are in the middle of syncing up with peers.

My off-hand guess is that most popular websites today are more computationally intensive than running the trustnet computation for one's own social graph—or viewing streamed online content.

Do you have any thoughts on how trustnet could scale in a medium (thousands of users) or big (millions of users) network?

Did you try the simulation? The default scenario spins up 1000 nodes, but you could tweak that to play around & get a sense for how the computations perform :)

Note: 1000 nodes is for the trust graph of a given person, not the given social network. I would sloppily guess that the relation between the nodes for a given person's trust graph & the size of the network scales around at least 1:25. So, for a trust graph of 1000 nodes, you might be looking at a social network of the order of 25 000 members in total. This is just me speculating in the moment, though, and was not a core concern of the scope or research I conducted as my part of my thesis :)