aboutcode-org / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
35 stars 23 forks source link

RFC: Introduce idea of embedded package #177

Open JonoYang opened 1 year ago

JonoYang commented 1 year ago

Many times, we encounter packages within other packages. For example, there are maven JARs that contain JARs, as is the case with Spring Boot or other uber JARs. We currently do not have the concept of relating packages like this, where a group of packages are contained within another package. This relation would help us in the cases where we are matching an uber JAR package and we can have a list of all the packages contained within that package and return with the match not only the uber JAR package data, but also the package data for all the other embedded packages.

This will probably be a new foreign key on Packages with the name embedded_packages.

JonoYang commented 1 year ago

An idea about how to populate the embedded package field for a Package would be to do it during package indexing time.

After we fingerprint the codebase for the package we want to index, we check to see if the directory fingerprints we just computed match against other directory fingerprints from other packages, excluding different versions of the package we are currently indexing. If there is a match, then we can say that this package that we're indexing is embedded in another.

This step can also show us what packages are embedded within the package we're currently indexing.

armijnhemel commented 1 year ago

Related: https://github.com/nexB/purldb/issues/163