DependencyTrack / dependency-track

Dependency-Track is an intelligent Component Analysis platform that allows organizations to identify and reduce risk in the software supply chain.
https://dependencytrack.org/
Apache License 2.0
2.65k stars 564 forks source link

Proposal : Distributed event bus for Dependency Track #1856

Open syalioune opened 2 years ago

syalioune commented 2 years ago

Current Behavior:

Dependency Track is by design an event based system. It currently uses the in-memory event distribution system provided by Alpine framework but it is restricted to one JVM, thus limiting the solution scalability.

Furthermore, I understand from Issue 1210 and several other comments I've seen here and there that the architectural design direction goes toward more standalone components doing specialized tasks.

Proposed Behavior:

What I'm proposing is a study and then PoC(s) to consider distributed event buses as alternatives to Alpine event subsystem. I could be a first step to decouple several Dependency Track services.

Relevant candidates would be :

Obviously, eventing requirements should be defined first (need for queing, pub/sub, consumer groups, priority handling, duplicate handling etc...).

nscuro commented 2 years ago

Thanks for raising this @syalioune. I'm happy you're bringing this up, as it is something we're actively looking into right now.

But as you said, we need to identify our problems first before we can decide on solutions and their implementation. We already identifed a few bottlenecks, and not all of them are solvable by introducing distributed messaging (although that probably depends on how messaging is used).

We also have to keep in mind that introducing a component like RabbitMQ and especially Kafka may drastically increase operational complexity, to a point where some users won't be able to operate DT on their own anymore.

I think it makes sense for us to do some initial information gathering "internally" first, but then later discussing this in the open so that contributors like yourself can participate.

syalioune commented 2 years ago

Great ! Keep us posted.

robertlagrant commented 1 year ago

@nscuro what's the event volume/complexity like? Is it possible to use the database as a simple event store?

nscuro commented 1 year ago

Every BOM upload generates an event. Depending on how it's triggered, every project or individual component may produce multiple events (check for policy violations, check for vulnerabilities, check for newer versions, calculate metrics etc.). The number of these events is amplified by BOM upload frequency and scheduling preferences of recurring tasks. Handling of most of these events will involve database access already.

While the event volume will be fairly low for small to medium sized portfolios with low-ish BOM upload frequency, I think it's a fair estimation that we talk about hundreds of thousands if not millions of events in an hour for larger portfolios, with higher BOM upload frequency.

If we really want to appeal to users with large portfolios, using the database as event store will not cut it.

(I'm aware this is a bold assertion and I do welcome other opinions btw)