Introduce "collection" projects for better usage of hierarchical view

rkg-mm commented 1 year ago

Current Behavior:

After implementation of #84, grouping different versions or environments into one project is a great feature and improves the overview in the project list. But it has one drawback: You now need to expand a project to see the current issue count, if the project is just a "meta"-level to group versions or sub-projects.

Proposed Behavior:

When selecting a project Classifier, there could be an option like "Project Collection". This would be applied to intermediate versions like "Project X - Android App" (containing multiple versions as children) or "Project X - Server" (containing multiple microservices below).

If "Project collection" is selected, additional configuration shall be shown: A dropdown to select the "Metrics type", which can be one of:

Latest uploaded BOM
Sum of direct children metrics
Specific child

The UI and API should then prevent upload of BOM and managing components manually for such collection project, and instead show the numbers of its children, according the configuratio. This would give better overview in the UI and allow additional use cases like https://github.com/DependencyTrack/dependency-track/issues/657, e.g. by later replacing the component view for collection projects with a metrics view.

The options logic should be:

Latest uploaded BOM The numbers of the (direct) child for which the BOM was uploaded last shall be shown on the parent. This would be helpful for versioned software, so you can see always the latest versions data on the parent.

Sum of direct children metrics The numbers of all (direct) children shall be summed up and taken for the parent. This could be interesting, e.g. if you have a meta level "Cloud Project X - Productive environment", with microservices as children, so you see the summary of the complete production environment

Specific child If selected, offers another input field to select a specific children somehow*. This can be useful in cases where e.g. "Cloud Project X" has direct children for "develop", "qa", "productive" (see above example), so you can always show the "productive" childs numbers on the parent, to focus on them in the views. PROBLEM: If you want to select a child, you cannot do this while creation, and also it would introduce a circular reference. Possible alternative: Define a tag name instead in a textbox input, which is then used to label the child. The first child found with that tag would then be taken.

When calculating numbers with a child, which also is a collection, the numbers calculated for this collection should be taken, not all of its children, like this you can configure yourself a nice hierarchy.

Performance Live calculation would be a problem for a big project structure. Therefore, the existing vulnerability analysis task should as last step get another step to re-calculate collection metrics. If we have some kind of information about which projects changed during the analysis, we could possibly limit the re-calculation to only trees in which numbers changed. Additionally, when adding/deleting new sub-projects, changing a sub-projects parent, changing collection configuration or uploading a BOM, (changing a tag if it matches the parents tag configuration if this approach is used), the metrics for the hierarchy above the changed project would have to be analyzed instantly. The results need to be cached.

Considerations to make:

Need to ensure when calculating metrics for the portfolio, the collection projects are not count, otherwise we would count duplicates
What if someone doesn't have access to all children projects, but only to a subset? I think if this is the case, he can see the full metrics, even if he only sees a subset. I can think of some controlling having access to top level metrics while not seeing into detailed projects. Also other solutions would not work with cached data.

robertlagrant commented 1 year ago

Could this work in the following scenario: I'd like to group my software project X by version, but each version of it also includes my software projects Y and Z:

                      X                                   Y                         Z
                      |                                   |                         |
      /---------------|---------------\          /--------|-------\        /--------|-------\
     1.0             1.1             2.0        3.2      3.4    3.4.1     5.4      5.6     5.7
      |               |               |
  /-------\       /-------\       /-------\
Y 3.2   Z 5.4   Y 3.4   Z 5.6  Y 3.4.1  Z 5.7

Will this allow that arbitrarily nested structure?

roadSurfer commented 1 year ago

My understanding is the current parent/child functionality already allows this. You would go into the UI for Y 3.2 and set its parent to X 1.0, then do the same for Z 5.4. Rinse and repeat for Y 3.4, setting the parent to X 1.1 and so on. You can then view the Y and Z projects nested under their respective parents in the project list.

What does not happen at the moment is the X projects including violation/vuln information from the children.

Also, a child can only have one parent but if you are going down the route of wanting multples; that seems to me to be more about recording deployments.

robertlagrant commented 1 year ago

Thanks - makes sense.

Also, a child can only have one parent but if you are going down the route of wanting multples; that seems to me to be more about recording deployments.

The example isn't exactly about deployments. It's that project Y is itself a parent-child relationship, as it has multiple versions to it, and to my understanding this is the way we record multiple versions of a thing, and specific versions of project Y are components of project X.

roadSurfer commented 1 year ago

You can do that was well or instead of. Doesn't affect the project parent/child relationship.

robertlagrant commented 1 year ago

Awesome. How does that work, if children can only have one parent?

roadSurfer commented 1 year ago

This is where I being to struggle with it as I am not quite sure that the parent/child is supposed to be modelling. Perhaps it is meant to relate more to the source structure? Anyway, Y 3.4 can only have one and that would be X 1.0 in your example, but it can still appear as a component in many other projects.

robertlagrant commented 1 year ago

From memory I think it was meant to be generic, but it's hard to see how generic it can be if there's only one parent allowed. I voted for having a specific concept of "version" and a generic concept of tagging, for the record!

But you're right; in this case I can use component for one relationship type and parent/child for versioning. I'll try that.

rkg-mm commented 1 year ago

This structure was more meant for a hierarchical organization of projects. Multi-parent etc. would get complicated with display logic, recursion logic, ACLs,... etc etc.

Also, what your use case sounds more like, is that you have different components (= one project each, maybe sub-projects of a bigger group of projects) and then have an application project which has a BOM referring to those components. Not sure if DT today has logic to model this nicely, so being able to relate from a BOM to another component-project, inheriting its vulnerabilities etc. Likely not, but this is what I think would be needed to do this correctly. Basically, if the project would have a PURL matching a PURL in a BOM, then the vulns should be inherited probably?

ragaskar commented 1 year ago

I mentioned in the OWASP Slack that this feature would meet a use-case my organization has, but thought I'd add those details here as well as a +1 for this feature:

We ship a "parent" product that in turn is composed of multiple, internally-developed "child" products (which have "grandchildren" dependencies). Although the internally developed products use package managers, they are generally not part of any kind of package ecosystem (i.e., think a Go app that uses go libraries, or a Rails app that uses gems). I plan on producing SBOMs for each of my "child" products that will contain a list of "grandchildren" products.

I'd like to be able to search for a CVE present in a "grandchild" product and identify: 1) which "parent" product versions are vulnerable, 2) which "child" product "versions" are vulnerable.

I'm presuming whatever object association this roll-up produces would hopefully result in roll-ups both on the Project index view and the "affected products" tab of a CVE detail (i.e., rather than the project index being achieved via some kind of view side summary, I would expect my parent product to return a collection of child product CVEs)

software-testing-professional commented 1 year ago

I'd like to share some use cases as well.

We have a similar structure, and are also trying to create this in DependencyTrack. And in addition, we have multiple target environments for our applications. Like public cloud, private cloud or application servers. So a CVE might affect applications running in a public cloud - but not on premise.

Use case 1: aggregation (horizontal) Teams create microservices or libraries, that are scanned and uploaded into DependencyTrack projects. A microservice repository holds source code, but also Docker images and 3rd party images, coming from Helm chart dependencies. So a complete repository scan creates 1-n SBOM files:

source code plus Docker image
Helm chart dependencies (for example postgres)

And one repository is represented by 1-n DependencyTrack projects.

Currently I'm uploading them using a "virtual" directory structure (== project names with "/").

demo-app:v1.2.3 for the source code.
demo-app/postgres:latest for the Helm chart dependency.

This way the postgres coming from the demo-app is separated from other postgres dbs, used by different microservices. And more important: used in different target environments. If I'm deploying into public cloud, I might be affected by a CVE. Whereas the private cloud deployments are not affected.

Additionally I place a tag demo-app:v1.2.3 on both projects (demo-app + demo-app/postgres), to bundle them for filtering etc.

Use case 2: aggregation (vertical) We aggregate microservices and libraries into bigger systems (self contained systems). A production system is made of

microservices (including Helm chart dependencies)
microservices that have components which are also separate DependencyTrack projects. (But this is currently not supported at all, and a bit out of scope here ;-) )
self contained systems (which itself contain microservices - or even "smaller" self contained systems).

@ragaskar I assume this use cases matches the parent-child-grandchild relationships you described. :-)

It would be a really valuable feature if DependencyTrack could support the horizontal and vertical aggregation.

robertlagrant commented 1 year ago

This might be a separate ticket, but if it were possible to record fix timescales, then also aggregating timescales on when parent projects/deployments have the transitive vulnerability removed would be excellent, so you could know what your intended vulnerability estate would look like forward over time.

rkg-mm commented 1 year ago

Seems like this is a highly requested addition to the trees. I don't think I also have the resources left to sponsor this feature at the moment. Would there be someone else who could volunteer this?

I think we see multiple parts here:

Defining how a "collection" project (I don't like the name, any proposals?) should behave (aggregate all children, show one specific child, see above...)
Calculating the numbers for the Project-Tree
Show aggregated vulnerabilities and policy violations in the collection projects tab
(Probably a separate feature, since in my opinion not a parent-child relationship) Possibility to have one project being a "component" of multiple other projects, e.g. by referencing another project in SBOM.

robertlagrant commented 1 year ago

@rkg-mm I've dragged some of the earlier conversation a little off-topic - apologies. Going back to your original question about what numbers should display, I was thinking one good option to offer users would be "all active subprojects". Then users can toggle them on and off depending on whether they consider them in active use.

DependencyTrack / dependency-track