crev-dev / cargo-crev

A cryptographically verifiable code review system for the cargo (Rust) package manager.
Apache License 2.0
2.09k stars 89 forks source link

fetch all: Build a public repository with an update process from the fork lists and the possibility to add PRs #336

Closed chrysn closed 4 years ago

chrysn commented 4 years ago

The current story behind cargo crev repo fetch all is currently to download dpc's crev-proofs and everything from there. This is a fast starting point, but no long-term solution, for a) it's centralized, and b) it's mixing up whom dpc trusts vs. who claims to have a public repository. (And either way it'll need better documentation).

Suggested plan:

Either

Either way, then we could

Estimating the complexity of the follow-up steps, I'd be leaning towards having a plain link list -- both the scraping and the PRs would be much easier that way. known_cargo_owners may lead the way here.

It might just as well be transported in git repositories, such that a fetch all would also traverse (unsigned, fully untrusted, "just as good as found anywhere on the web") links filed with any of the discovered repositories. This would allow easier scaling out, because if someone comes up with an enhanced way to find crev proofs (say, scraping keybase), they wouldn't need to make a PR to update the central CI, but rather have that scraper run into their own repository and just link that.

One tricky issue that might need further discussion is the privacy aspect (a user running their own git server could practically see every invocation of cargo crev repo fetch across the Internet, as can github and gitlab), but that's not made worse by this proposal than it already is (for every repository that somehow winds up in today's tree will receive the same kind of pings).

dpc commented 4 years ago

b) it's mixing up whom dpc trusts vs. who claims to have a public repository.

I'd like to add that trust proof with level: none is very useful for "advertising, but not trusting some id" and I am using it quite liberally. Other uses will see such trust proof and fetch such id on fetch all, but without any initial trust. If people were liberal in using it than new ids would be rather quickly discoverable and id/trust network quite dense.

You could write a very simple automated bot where anyone can add an id + url, and get automatic trust proof level: none. The only problem with trusting about anyone is spam etc. malicious content, huge repos etc.

Privacy problems can be addressed by using Tor, or downloading bundles of proof aggregated be other services and so on. I wouldn't worry too much about it yet.

chrysn commented 4 years ago

level: none proofs sound like a straightforward way to go, and a builder for those should be doable easily.

Maybe a CI-based autobuilder could have its secret key encrypted in the repository, with a key only known to the operator and provided by the environment. The identity would be advertised as "don't give this any non-none trust, we'll do our best to only emit none signatures ourselves but don't take our word for it."

Will give it a try.

dpc commented 4 years ago

Sounds good.

chrysn commented 4 years ago

... and it's running: https://gitlab.com/crev-dev/auto-crev-proofs/ periodically wakes up and queries the GitHub and GitLab APIs for forks. (And external ones can be included in a text file, though I don't expect that this will be used often.)

That's about as much as I can do; if you're happy with it, you may want to give that ID a level:none trust, or even put it in as an additional starting point.

dpc commented 4 years ago

https://github.com/dpc/crev-proofs/commit/c324d06567368b03cc89c659723b3729cf083a96

I added it to https://github.com/crev-dev/cargo-crev/wiki/List-of-Proof-Repositories as well.

Awesome!

dpc commented 4 years ago

BTW. I'd add at least one id to the others.jsonl file as an example, because I have no idea what the format of it should be, and more people will get confused.

bestia-dev commented 4 years ago

I am building a web interface for cargo-crev. https://bestia.dev/cargo_crev_web/info/group_by_author/ image

It should have all the review authors eventually. And a blacklist for bad, incomplete, obsolete repos or bad authors. @Kornelski has already made a larger list of authors and uses it on lib.rs. I am still working on it. The web app has a dedicated crev ID and github repo: https://github.com/cargo-crev-web/crev-proofs I think this could be the repo and ID to have a list of all authors (except the bad ones). The web app could have a link to export the author list in a way, that could be imported into local cargo-crev. Maybe a json file to download.

bestia-dev commented 4 years ago

I found out https://gitlab.com/crev-dev/auto-crev-proofs Can we say that this is the central repo that has the list of "all crev-proofs repos"? Is it possible to add also "blacklisted" repos. Maybe with manually change to trusted:negative or similar? I would gladly use that list as the base for fetching repos for cargo_crev_web.

kornelski commented 4 years ago

auto-crev-proofs scrapes list of forks, so it's a good place to discover pretty much all crev proof repos.

We haven't had to deal with abuse yet. When every fork is added automatically, then it is possible for an abuser to insert junk data. It could be spam (make sure to add rel="nofollow ugc" to links). It could be DoS (a proof repo with gigabytes of junk, or millions of trust proofs for various URLs making fetch take forever, etc.). I'm not sure if we should be doing anything about it yet.

bestia-dev commented 4 years ago

@kornelski, Is your list equal to auto-crev-proofs or do you have some more repos? I have 55 repos till now. https://bestia.dev/cargo_crev_web/info/group_by_author/ And I will check and add the rest from the auto-crev-proofs list (around 20): https://bestia.dev/cargo_crev_web/reserved_folder/list_new_author_id/ I have a blacklist of incomplete or obsolete repos:

[
    "https://github.com/confio/crev-proofs",
    "https://github.com/dmerejkowsky/crev-proofs",
    "https://github.com/jonas-schievink/crev-proofs",
    "https://github.com/scott-wilson/crev-proofs",
    "https://github.com/sphinxc0re/crev-proofs",
    "https://github.com/Thinkofname/crev-proofs",
    "https://github.com/thorhs/crev-proofs",
    "https://github.com/adeschamps/crev-proofs",
    "https://github.com/bjorn3/crev-proofs",
    "https://github.com/cole-h/crev-proofs",
    "https://github.com/dirvine/crev-proofs",
    "https://github.com/Eraden/crev-proofs",
    "https://github.com/ffranr/crev-proofs",
    "https://github.com/alaric/crev-proofs",
    "https://github.com/Flakebi/crev-proofs",
    "https://github.com/JamesHinshelwood/crev-proofs",
    "https://github.com/LaurenceGA/crev-proofs",
    "https://github.com/maccam912/crev-proofs",
    "https://github.com/crev-dev/crev-proofs",
    "https://github.com/Alxandr/crev-proofs",
    "https://github.com/pimotte/crev-proofs",
    "https://github.com/Alexendoo/crev-proofs",
    "https://github.com/ivanceras/crev-proofs",
    "https://github.com/Gaelan/crev-proofs",
    "https://github.com/hgzimmerman/crev-proofs",
    "https://github.com/leo-lb/crev-proofs",
    "https://github.com/sgeisler/crev-proofs",
    "https://github.com/otavio/crev-proofs",
    "https://github.com/frigus02/crev-proofs",
    "https://github.com/jplatte/crev-proofs",
    "https://github.com/mchesser/crev-proofs",
    "https://github.com/braunse/crev-proofs",
    "https://github.com/traxys/crev-proofs",
    "https://github.com/alexmaco/crev-proofs",
    "https://github.com/VictorKoenders/crev-proofs",
    "https://github.com/bmhenry/crev-proofs",
    "https://github.com/gilescope/crev-proofs",
    "https://github.com/stusmall/crev-proofs",
    "https://github.com/bwbroersma/crev-proofs"
]
kornelski commented 4 years ago

I use auto-crev-proofs and repos listed in the wiki.

bestia-dev commented 4 years ago

Now I have this reviews on the cargo_crev_web: 2020-05-27 authors:55 crates: 514, reviews: 886

If you have more, I would like to know and to add them.

dpc commented 4 years ago

auto-crev-proofs probably has any publicly know CrevID yet. :) 55 authors seems about right.

there are dozens of us