crev-dev / cargo-crev

A cryptographically verifiable code review system for the cargo (Rust) package manager.
Apache License 2.0
2.09k stars 89 forks source link

Make trust graph consider paths (aka flow). #45

Open dpc opened 5 years ago

dpc commented 5 years ago

This is related to #44 .

Right now WoT graph is build by just a cost-bounded flooding of a graph. This makes it possible for anyone to create a new CrevId, trust it, and this way artificially increase the possible count of reviewes for a given crate.

This algorithm should keep track of path (id(s) of that directly trusted this one), on each step to the root of the trust tree, and so that when calculating the count of reviews, it's possible to "merge" reviews coming out from a common path, for the purpose of calculating the total trust count.

Example: If you directly trust only one other CrevId, you can only have trust count equal to 1, for any given crate, no matter how many people reviewed it.

dpc commented 5 years ago

To explain:

             /- b
root -> a - c
             \- d

In this case, if both b and d create a review, it will count only as one review, to prevent a from being able to create and trust fake IDs, to increase review count.

ffranr commented 5 years ago

it will only count as one review

Am I correct in saying that the count, in this case, is made from a point of view?

Shouldn't we avoid trusting nodes who we suspect of trusting and creating fake nodes anyway?

dpc commented 5 years ago

Shouldn't we avoid trusting nodes who we suspect of trusting and creating fake nodes anyway?

Of course. But trust is sometimes broken, and accounts get compromised. Currently I believe that after crev gets popular, everyone should require at least 2 reviews to trust anything. And such review should be "uncorrelated". This issue is a method to make sure no account can increase the review count by creating sock-puppet accounts.

pimotte commented 5 years ago

I think this would be solved by computing the max-flow from the root to each package, where people are nodes. Does this match what you're looking for?

dpc commented 5 years ago

@pimotte I am not sure which algorithm you refer to, and I am not that well versed with graph algorithms. After brief looking at https://en.wikipedia.org/wiki/Maximum_flow_problem, I guess one could say I'm looking for max-flow, if all edges have flow of 1, the source is root of the WoT, and all trusted users that have reviewed the package are sinks.

dpc commented 5 years ago

Something to consider regarding max flow algorithm:

http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002600.html

dpc commented 4 years ago

I've read http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-August/002600.html and I pretty much agree with everything there.

This might affect #329 .

I was naively thinking about implementing a proper max-flow algorithm, but something like tracking number for each node, and splitting it between trusted 2nd-level node is probably easier and can be done breadth-first.

TODO:

pimotte commented 4 years ago

Could you elaborate a bit on what you mean with "tracking number"?

Other thoughts:

329 basically introduces the concept of "negative trust". The reporter doesn't phrase it as such, but if your network is saying someone has a certain level of trust and you want to downgrade that, that is what is going on. Introducing negative trust in the core mechanism will likely lead to an even more complex system. Advogato with a max-flow algorithm that works breadth-first seems like the perfect way to go about this. Personally, if that yields unexpected results for the user, I'd see that as a UX problem, rather than an algorithms problem. Those then could be resolved by hard-capping certain results based on reviews from the user or possbily allowing the user insight into why this result came to be, which might lead to a re-evaluation of the trust they have expressed in other parties.

This last bit is actually the counter-intuitive part about all these mechanisms. If I trust Alice and Alice trusts Eve, but I don't trust Eve, than that should lead me to re-evaluate my trust in Alice. If a crev result shows me this, it's very easy to say "oh, but I want to downgrade Eve, so let's tell that to crev". If the user can see that crev is trusting Eve because of Alice, then I get to choose: I downgrade my trust in Alice, since her judgment of Eve affects it, or I adjust my assessment of Eve, because I trust Alice's judgment of her above my own. In neither case I'll input something directly related to Eve, even though she "started" this.

dpc commented 4 years ago

but something like tracking number for each node

That's just my way of summing up my shallow understanding of Advogato - keep track of how much flow each node has (some number).

dpc commented 4 years ago

After reading through http://www.squarefree.com/trust/trust.pdf

Looking at these trust metrics etc. I think there are some fundamental differences between how they work, and how I perceive a problem.

First, for me the trust is not binary. For some reason most (all?) the existing trust solutions are binary. Something is either trustworthy or not. In crev, there are levels (and maybe also "confidence"). So the trust is more nuanced. Even without any flow analysis, the trust level can be easily made to "dissipate" as the distance from the root grows.

Then, I consider everything subjective and want to embrace that subjectivity. There is no one global "this is trustworthy" vs "this isn't". It's all opinions, and depending on circumstances two different agents, with same knowledge, can reach different conclusions about what is trustworthy.

Third, a combination of the previous two: there is no fixed point where something is trustworthy. Even the same person, with the same WoT, can run multiple verification runs, with different parameters (trading: depth, required trust-level, required understanding level, required redundancy, etc for each other), and look for the most susceptible pieces of code to review themselves.

If I trust Alice and Alice trusts Eve, but I don't trust Eve, than that should lead me to re-evaluate my trust in Alice.

The subjectivity is the reason why I don't think it's necessarily the case. It would make sense if we considered everything objective. But if it's subjective, then it's totally fine to have a different opinion. With the confidence stuff, there will be a way to possibly overwrite inferred trust if needed. But that will require a conscious decision of someone analyzing their WoT.

Originally I wanted flow analysis pretty much just to make the redundancy option (number of required reviews) to be attack-proof. So that someone in the WoT can't just create sock-puppets to create "copies" of the review that would count as many different reviews.

So my thinking was - after finding all the reviews, just run some kind of a max-flow algorithm from all the matching reviews to the root, and drop/merge multiple reviews coming through the same trust-bottleneck to count as one.

dpc commented 4 years ago

The problem I'm looking at might not be a max-flow algorithm... . I'm only looking at flow of 1 everywhere, and I'm not interested in the flow value, but at actually pruning "duplicates". A naive version seems fast and simple:

However the result is not "stable" and depends on the order of how nodes were sorted, and how they picked the previous node to use.

But it seems to me that if every time when taking a previous node while having a choice of some other previous nodes, we note the unused previous nodes on the used previous node and allow subsequent "traverse the path back to root" to use the previous nodes that weren't taken like they were previous nodes of the currently consider node itself ... things might work OK? It seems to fix the problem where traversal uses a path that is needed for another traversal, instead of a path that no one else uses.

It's late and I'm probably rumbling, but maybe myself I'll be able to decipher it, once I get back to it.

dpc commented 2 years ago

https://arxiv.org/abs/2203.00671

dpc commented 2 years ago

https://www.youtube.com/watch?v=KsMtVthpkzI