Support detecting/matching uberjars

aboutcode-org / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss

https://purldb.readthedocs.io/

35 stars 23 forks source link

Support detecting/matching uberjars #69

Open pombredanne opened 1 year ago

pombredanne commented 1 year ago

An Uberjar is a JAR combining many JARs in repackaged format. In contrast with a fatjar, it does not contain nested JAR-in-JAR. https://maven.apache.org/plugins/maven-shade-plugin/ is one of the tools that creates these.

The analysis of such as JAR is challenging because the contents of many JARs are mixed in a single JAR.

pombredanne commented 1 year ago

@JonoYang ping

JonoYang commented 1 year ago

@pombredanne This is the current way uberjars are handled:

Index packages detected from poms from uberjars
- This is done so we match the directories of the individual packages to the directories in the uberjar

The rest of the matching process is handled by directory matching.

pombredanne commented 1 year ago

A good example of uberjat is the closure-compiler: See also google/closure-compiler#4104

There seems to be several embeds and Jarjars in this uber jar that are weakly documented (e.g., not documented at all).

As such neither the binary, the source nor the git repo contain a comprehensive documentation of the various bundled packages.

pombredanne commented 1 year ago

Another example is: jline 2.12 which is shading jansi. In https://repo1.maven.org/maven2/jline/jline/2.12/jline-2.12.pom is shading org.fusesource/jansi 1.11 https://repo1.maven.org/maven2/org/fusesource/jansi/jansi/1.11/jansi-1.11.pom

And jansi is shading groupId=org.fusesource.hawtjni artifactId=hawtjni-runtime version=1.8

.... this is uberjars all the way as we have:

a top level JAR that is
- shading jline
- shading jansi
  - shading jansi-native
    - itself contains multiple JARs include one for each native (which we do not recognize correctly as Maven...)
  - shading hawtjni

And jansi-native 1.5 is made of many JARs for each OS: https://repo1.maven.org/maven2/org/fusesource/jansi/jansi-native/1.5/ And we would need to index them all