anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.8k stars 531 forks source link

Gradle support (non-lockfiles) #2472

Open tobyweston opened 6 months ago

tobyweston commented 6 months ago

Apologies if I missed it but I looked through the documentation (README) and found some merge requests on the topic and it looks to me that Syft works with gradle lockfiles but not alternatives to declaring dependencies.

I was hoping it would have the equivalent for pom.xml parsing when processing a folder/directory (which I think the java-pom cataloger does).

Happy to have a go at contributing if its missing but I was thinking support for build.gradle.kts (Kotlin) and build.gradle (Groovy) styles of declaring dependencies.

Also happy to be corrected if I misunderstood the use here - I'm working with the file system and not container images and looking to produce BOMs without actually building source (as far as possible). So working from source and dependency management tooling like Maven, Gradle, SBT, NPM etc). The difference between scanning package.json vs package-lock.json where I'd actually have to invoke npm.

Why is this needed:

Full support for different project "styles" using Gradle so I can support a disparate set of code bases.

Additional comments:

Did I miss some extended documentation that describes each of the "catelogers"? For example something describing if the gradle lockfile cataloger works with legacy and newer formats?

kzantow commented 6 months ago

Hi @tobyweston -- you're right. Today, Syft only supports Gradle lockfiles. You can see the different catalogers for Java here: https://github.com/anchore/syft/tree/main/syft/pkg/cataloger/java. Adding support for alternatives like the build.gradle and build.gradle.kts as you mentioned would be great, and definitely sounds like something Syft could support. I don't know a lot about the Kotlin version (I'm assuming it's full-blown Kotlin), but I will note that there is some amount of complexity with the .gradle version, which, as I understand it is a full-blown Groovy file... so could be a bit tricky to parse well, but also might only need a smallish subset of Groovy parsing to handle the majority of files.

We always welcome PRs and would be happy to help shepherd something like this. Whether someone decided to work on this or not, it would be really helpful to have links to a bunch of real world examples, so if you happened to have some, linking them here would be great!

tobyweston commented 6 months ago

Thanks for the comments. I'll have a look around that folder and the parser code - I'm got a Go developer so don't expect much! I can help elsewhere worst case with testing and examples. I'm looking to build a fairly big BOM catalogue so have some "enterprise" use cases.

For future folks reading, also spotted related https://github.com/anchore/syft/pull/707

PS loving your work - I got setup and producing BOMs in < 5 mins via brew and :dir. 🙏

tobyweston commented 6 months ago

Just wondering if there's a lightweight way to parse Kotlin and Groovy files... I have a simplistic use case where as MVP I'd only need name, version and details from the cataloger itself.

I'd be tempted to regex it rather that try and build some sort of AST. 🤔

kzantow commented 6 months ago

Just wondering if there's a lightweight way to parse Kotlin and Groovy files... I have a simplistic use case where as MVP I'd only need name, version and details from the cataloger itself.

I'd be tempted to regex it rather that try and build some sort of AST. 🤔

I only mention parsing, as we were able to improve Rebar package identification by implementing some logic to handle at least some basics about Erlang, which is used for the rebar.lock files. I suspect Kotlin- and Groovy-based files might fall into a similar category, since they are all using programming languages.

That said, we have a lot of regex usage to pull details out of various file formats and an approach like that could definitely work for at least an initial implementation -- I'd proffer that finding something here is better than nothing, as long as it's not inaccurately including a bunch of things. The only thing I'd ask if someone were to work on any implementations, is to find a lot of examples and include them as test cases for the parser (alternately, just linking a bunch of common real world examples in this issue for whoever may work on it in the future would be very helpful!).

tobyweston commented 6 months ago

Was chatting with some colleagues and it looks like theres a lot of variability with Gradle, I gather it doesn't fall into the same camp as a declarative build definition file (like Maven's XML based POM) but is "looser". Teams can change the rules and so we might not see obvious patterns across all variations.

The suggestion was to use Kotlin or Groovy itself to work with the objects at a language level - so back to the AST in my mind. I think it's going to be hard work. The other suggest we had was to preprocess Gradle into Maven's POM (which is there or thereabouts available via Gradle plugins). So it could be that this is documented suggestion rather than a code change to Syft?

Is there a place we can catalogue examples (here perhaps for now). I can try and get these together but will have to anonymise and include examples rather than full blown code bases.