chainloop-dev / chainloop

Chainloop is an Open Source evidence store for your Software Supply Chain attestations, SBOMs, VEX, SARIF, CSAF files, QA reports, and more.
https://docs.chainloop.dev
Apache License 2.0
364 stars 27 forks source link

Integrating SBOM and Attestations with Backstage through a Chainloop Extension #1336

Open ImperiumTakp opened 1 week ago

ImperiumTakp commented 1 week ago

Proposal: Integrating SBOM and Attestations with Backstage through a Chainloop Extension

Context:

I recently developed a version matrix plugin for Backstage that presents package listings for different package managers. Initially, the implementation was focused on Composer, where I manually sourced package data and displayed it in Backstage. During the process, I discovered that Chainloop, along with Syft, could serve as an excellent resource for gathering Software Bill of Materials (SBOM) and other attestation data, which would be far more scalable and reliable than building a package listing system from scratch. This led me to explore the potential of integrating Chainloop with Backstage to offer SBOM version tracking.

Proposal:

I propose building a Chainloop extension that integrates SBOM and attestation data with Backstage, initially focusing on SBOM version listings. The architecture for this integration involves decoupling the systems through a message queue to facilitate robust communication between Chainloop's FanOut mechanism and Backstage's backend.

Architecture Overview:

graph TD
    %% Chainloop FanOut process
    subgraph Chainloop
        A1[Chainloop receives SBOM Data] --> A2[Chainloop FanOut Extension]
        A2 --> A3[FanOut Task triggered]
        A3 --> A4[Prepare Message for Queue]
        A4 --> MQ1[Message Queue]
    end

    %% Message Queue decoupling the system
    MQ1[(Message Queue - NATS/Kafka/SQS)] 

    %% Backstage Backend Plugin with detailed tasks
    subgraph Backstage Backend Plugin
        MQ1 --> B1[Scheduled Task]
        B1 --> B2[Pull Message from Queue]
        B2 --> B3[Parse SBOM Data from Message]
        B3 --> B4[Match SBOM to Backstage Entity]
        B4 --> B5[Process SBOM Packages]
        B5 --> B6[Store Processed Data in PostgreSQL DB]
    end

    %% PostgreSQL Database storing the processed data
    B6 --> DB1[(PostgreSQL Database)]

    %% Backstage User interaction with processed data
    subgraph Backstage User Interaction
        U1[Backstage User] --> UI1[Backstage Frontend - View Entity Page]
        UI1 --> UI2[Request SBOM Package Data]
        UI2 --> DB1
        DB1 --> UI3[Render SBOM Package Details on UI]
    end

Detailed Flow:

  1. Chainloop FanOut: Chainloop receives SBOM data, which is processed through a custom FanOut extension. The extension prepares the SBOM data and pushes it to a message queue (e.g., NATS, Kafka, SQS, etc.).

  2. Message Queue: A message queue decouples Chainloop and Backstage, ensuring reliability and flexibility in communication.

  3. Backstage Backend Plugin:

    • A scheduled task in Backstage pulls messages from the queue.
    • The plugin parses the SBOM data from the message and matches it to the relevant entity in Backstage.
    • The SBOM package data is then processed and stored in a PostgreSQL database.
  4. Backstage User Interaction:

    • Users interact with the processed SBOM data via Backstage’s frontend, where they can view version listings and package details related to the specific Backstage entity.

Open Questions for Discussion:

1. Message Queue Integration: I opted for a message queue between Chainloop and Backstage to decouple the systems and allow flexibility in the underlying infrastructure (e.g., supporting NATS, Kafka, SQS, etc.). However, I'm open to feedback—do you see this as the best approach, or would a more direct integration (e.g., HTTP-based or WebSocket) be preferable in this case? I’m particularly interested in understanding any trade-offs in terms of performance, complexity, or maintainability.

2. FanOut Process: Should the FanOut process be the preferred extension mechanism for handling this data in Chainloop, or is there another Chainloop component that could provide a more efficient or streamlined method for pushing SBOM data?

3. Backstage Backend Module Structure: Given that I plan to use multiple message queue systems (NATS, Kafka, etc.) based on user needs, should this be handled in a modular fashion within the Backstage plugin, or would a more uniform approach with configurable options be easier to maintain?

Reason for Proposal:

As I mentioned earlier, I initially built a version matrix plugin for Backstage that provides listings of package versions for Composer. Realizing that tools like Syft and Chainloop can automate and centralize SBOM generation and attestation, I want to leverage Chainloop’s existing capabilities instead of manually building a version tracking system from scratch. This integration would provide a scalable solution for organizations using Chainloop and Backstage, starting with SBOM listings and potentially expanding to other types of attestations in the future.

I believe this proposal could greatly enhance both Chainloop's extensibility and Backstage's capability to display SBOM and attestation data, providing developers with more transparency and traceability in their software supply chains.

Looking forward to your feedback and suggestions on this integration approach!

migmartri commented 1 week ago

hi @ImperiumTakp

First of all, thank you so much for taking the time to craft such a detailed feature request.

From the functional point of view, your request makes sense overall. We have questions, though, that might impact the implementation.

  1. Backstage Backend Module

Do you envision this plugin to be proprietary or general-purpose and open source?

We are open to both, but knowing which one might have an impact on how we should communicate with it. For example, if the plugin was OSS, we could start with a backstage fanout + backstage plugin using HTTP as the first step. But if it's proprietary, a general-purpose NATS integration might make sense.

Message Queue: A message queue decouples Chainloop and Backstage, ensuring reliability and flexibility in communication.

We have discussed in the past how a fanout integration for an event bus could be useful (https://github.com/chainloop-dev/chainloop/issues/419), but we always think about the fanout plugin pushing to a topic in an event bus running somewhere else.

It's not clear to me if you are proposing Chainloop should run a NATS stream instance as well or instead, it should be running next to backstage.

initially focusing on SBOM version listings.

I'd love to understand more about what you mean by version listing. That way, we can decide what kind of data must be sent over the stream, i.e., attestation info, SBOM, or both.

These are some of the questions that got raised during an internal chat with the team, we got very excited of your proposal so we'd like to discuss it further.

Would you be able to join our Slack community and maybe we can chat there?

Thanks again!

ImperiumTakp commented 1 week ago

hi @ImperiumTakp

First of all, thank you so much for taking the time to craft such a detailed feature request.

From the functional point of view, your request makes sense overall. We have questions, though, that might impact the implementation.

  1. Backstage Backend Module

Do you envision this plugin to be proprietary or general-purpose and open source?

We are open to both, but knowing which one might have an impact on how we should communicate with it. For example, if the plugin was OSS, we could start with a backstage fanout + backstage plugin using HTTP as the first step. But if it's proprietary, a general-purpose NATS integration might make sense.

Message Queue: A message queue decouples Chainloop and Backstage, ensuring reliability and flexibility in communication.

We have discussed in the past how a fanout integration for an event bus could be useful (#419), but we always think about the fanout plugin pushing to a topic in an event bus running somewhere else.

It's not clear to me if you are proposing Chainloop should run a NATS stream instance as well or instead, it should be running next to backstage.

initially focusing on SBOM version listings.

I'd love to understand more about what you mean by version listing. That way, we can decide what kind of data must be sent over the stream, i.e., attestation info, SBOM, or both.

These are some of the questions that got raised during an internal chat with the team, we got very excited of your proposal so we'd like to discuss it further.

Would you be able to join our Slack community and maybe we can chat there?

Thanks again!

Hi @migmartri

A bit of background

As you can see in the screenshot, this plugin provides an overview of PHP dependencies across different systems (it’s for Composer, specifically). The idea is to track package versions per system, and since SBOM data relates to this, I thought it made sense to use SBOMs generated during the CI process as the "source of truth" for this info. How we get that data into Backstage is really just a technical detail, but it feels like a natural fit with Chainloop.

Clarifying a few things:

1. Backstage Backend Module: Open Source

Yes, I envision the plugin being open-source and general-purpose. Backstage has a solid plugin ecosystem (check Backstage Plugins), and I think this would be a great addition for anyone using Chainloop and Backstage together. It would be super useful for anyone already running Backstage, and they could integrate Chainloop seamlessly if they decided to.

2. Message Queue or Another Way?

I suggested a message queue because Backstage typically pulls data while Chainloop pushes it, so a queue seemed like a good bridge. That said, I’m totally open to simpler options like using HTTP for the MVP. The main goal is to make sure Backstage can process and own the SBOM data, keeping things smooth and self-contained.

3. SBOM Version Listings

When I talk about “SBOM version listings,” I mean using Chainloop to track which package versions are included in the SBOM for each system. In the current version matrix plugin (screenshot above), we track dependency versions per system. Ideally, Chainloop would push the SBOM info, and then Backstage would pull that data to display the package versions in a similar matrix. The initial focus is on showing versions, but there’s definitely room to expand to attestation data or other SBOM details down the line.

image

ImperiumTakp commented 1 week ago

I'll definitely join the Slack! My plan for the next step is to create a public Backstage setup with the work-in-progress Chainloop plugin. Looking forward to collaborating more!

migmartri commented 1 week ago

Hi @ImperiumTakp, thanks again for taking the time to elaborate on those points, please see inline.

The idea is to track package versions per system, and since SBOM data relates to this, I thought it made sense to use SBOMs generated during the CI process as the "source of truth" for this info. How we get that data into Backstage is really just a technical detail, but it feels like a natural fit with Chainloop.

I agree; it looks like a great fit. Let's do it :)

  1. Backstage Backend Module: Open Source

Perfect!

The reason I am asking is that, in fact, down in our backlog of things, creating an Open Source integration for backstage was in our pipeline, so we will be happy to collaborate with you. Potentially on the frontend and backend side :)

Originally, though, we were thinking of adding it as part of the https://github.com/chainloop-dev GitHub organization. That way

Would you be open to hosting this plugin code under https://github.com/chainloop-dev org? Of course, we will add a permissive Apache License, you will core contributor of the repo, etc.

  1. Message Queue or Another Way?

I agree that an event bus is a scalable and extensible way of connecting both projects, but I am worried that it will add an additional dependency that somebody will need to deploy. From experience, we've got some feedback that Chainloop is already a complex setup, so adding an event bus just for this does not seem like an option in the short term.

Using an event bus opens other challenges, such as authentication, reachability, and so on. So, for the initial implementation, I'd suggest that we stick to POST API calls, similar to what we do for DependencyTrack unless you think there will be a big bottleneck. Note that eventually, on the Chainloop side, we are going to add more resiliency and control on the fanout interface so that we can be more gentle with the backstage endpoint :) https://github.com/chainloop-dev/chainloop/issues/39

  1. SBOM Version Listings

Perfect, thanks for the info. I think that sending SBOM data + Attestation info would do the trick. Attestation info contains more metadata that could be useful down the line.

I can also see that you are already in Slack, welcome! Let me know if you want to have a quick chat with our team so we can put a face to a name and flesh out a plan, what do you think? :)

Thanks again, Miguel

migmartri commented 11 hours ago

I've created a repo https://github.com/chainloop-dev/backstage-plugin