EmbarkStudios / cargo-deny

❌ Cargo plugin for linting your dependencies 🦀
http://embark.rs
Apache License 2.0
1.68k stars 82 forks source link

Exhaustive license searching #159

Open iliana opened 4 years ago

iliana commented 4 years ago

Is your feature request related to a problem? Please describe. In Bottlerocket, we use cargo-deny for enforcing a license policy, as well as bottlerocket-license-scan to identify license files in vendored sources to copy into a final OS image.

bottlerocket-license-scan grew a clarification feature very much like deny.toml to handle situations where a license file doesn't scan as anything SPDX knows about (within reasonable confidence), or where a license is scanned that isn't part of the crate's license string. I believe this logic is similar to something we saw in cargo-deny v0.2, but maybe I'm misremembering.

We've seen a decent amount of -sys crates that vendor code and don't reflect the vendored license in the license field. Some examples are backtrace-sys, zstd-sys, and now that I'm writing a bottlerocket-license-scan clarify.toml for cargo-deny itself, libgit2-sys, which vendors libgit2 and transitively vendors some of libgit2's dependencies. It fortunately all looks permissive, but there's still a good amount of work for software archaeologists to pick apart.

Describe the solution you'd like I'd just like to clarify (heh) whether you intend to keep your documented approach or adjust it:

Note however, that cargo-deny does not (currently) exhaustively search the entirety of the source code of every crate to find every possible license that could be attributed to the crate, as there are a ton of edge cases to that approach.

Like you say, there are a ton of edge cases, and those edges are very sharp and pointy.

I'd like to actually go to all these upstreams and help them reflect their total licenses properly, and I think cargo-deny can help with that, but it might need to grow exhaustive license searching again on an opt-in basis to assist with that.

Jake-Shadle commented 4 years ago

My current plan for this is #121. clearlydefined.io could act as a supplement to local license checking, and allows anyone to submit curations to a central source of truth that (hopefully) eventually reach the actual upstream repo. But yes, we've noticed the exact same thing ourselves, basically every crate that links c/c++ code seems to completely ignore the license requirements of that code.