EmbarkStudios / cargo-deny

❌ Cargo plugin for linting your dependencies 🦀
http://embark.rs
Apache License 2.0
1.62k stars 80 forks source link

Add a check for crates that do not match the referenced git repository state #644

Open weiznich opened 3 months ago

weiznich commented 3 months ago

Is your feature request related to a problem? Please describe. The recent findings in xz-utils among other things have shown that backdoors or other vulnerabilities can be introduced by modifying the released source code only (as compared to the checked in source code in the git repositories).

cargo publish uses the local copy of the source code it is open for the same attack vector. crates.io does not perform any validation of the uploaded source code as far as I'm aware. This could be a huge issue, especially for things like proc-macros or build scripts.

Describe the solution you'd like

I would like to see an additional check in cargo deny that allows to check that the released .crates file contains the same source code than the referenced git repository. Cargo embeds this information via a .cargo_vcs_info.json file. It's likely useful to have additional options to configure an allow list and to deny crates without this information.

paolobarbolini commented 3 months ago

We're doing it in https://crates.io/crates/cargo-goggles and I've heard lib.rs is also implementing it. crates.io might get it too at some point.

Jake-Shadle commented 3 months ago

My first preference would be for crates.io to support it since then everyone would get the benefit rather than just cargo-deny users, in addition, cargo-deny is meant to complete quickly, where the only real bottleneck is cargo fetch and retrieving the advisory database, so there would need to be an acceleration mechanism to make this feasible for cargo-deny. That being said, I have already thought about having a git repo/DB so that licensing information for crates.io crates can be collated in one location so that users don't need to specify clarifications (as often) and can rather rely on them from machine reading as well as user curation, and part of that would be checksumming files in the source repo while finding the license information, so combining that in to one location would be feasible.