backstage / backstage

Backstage is an open framework for building developer portals
https://backstage.io/
Apache License 2.0
26.89k stars 5.58k forks source link

feat: events #13931

Closed pjungermann closed 1 year ago

pjungermann commented 1 year ago

feat(events): add events management capabilities

This change introduces some new plugins which provide the basics for managing events inside of backstage. Hereby, it offers extension points to add event publishers and subscribers as well as to exchange the event broker implementation.

All plugins support the new backend-plugin-api.

Relates-to: #11082

feat(events/http): add HTTP endpoint-based event publisher

This plugin adds an event publisher which receives events via (an) HTTP endpoint(s) and can be used as destination at webhook subscriptions.

Relates-to: #11082

feat(events,example): integrate at example backend

Integrate plugins-events-backend with plugins-events-backend-module-http at the example backend.

feat(events,example): add simple way to add event-based entity providers

Add DemoEventBasedEntityProvider as example implementation.

feat(events/sqs): add a new AWS SQS event publisher

This change introduces a new plugin @backstage/plugin-events-backend-module-sqs.

This plugin provides an event publisher which receives events from (an) AWS SQS queue(s) and publishes them to the event broker.

The plugin supports the new backend-plugin-api and connects with the other plugins.

feat(events/bitbucketCloud): add BitbucketCloudEventRouter

Add an event router for Bitbucket Cloud which handles events from the topic bitbucketCloud and re-publishes events under their more specific topic based on the x-event-key metadata like e.g., bitbucketCloud.repo:push.

fix(catalog/bitbucketCloud): fix test file name

The file was forgotten to be adjusted as part of PR #13859.

Relates-to: PR #13859

feat(events,catalog/bitbucketCloud): handle repo:push events

Handle Bitbucket Cloud repo:push events at the BitbucketCloudEntityProvider by subscribing to topic bitbucketCloud.repo:push.

Implements EventSubscriber to receive events for the topic bitbucketCloud.repo:push.

On repo:push, the affected repository will be refreshed. This includes adding new Location entities, refreshing existing ones, and removing obsolete ones.

To support this, a new annotation bitbucket.org/repo-url was added to Location entities.

A full refresh will require 1 API call to Bitbucket Cloud to discover all catalog files. When we handle one repo:push event, we also need 1 API call in order to know which catalog files exist. This may lead to more discovery-related API calls (code search). The main cause for hitting the rate limits are Locations refresh-related operations.

A reduction of total API calls to reduce the rate limit issues can only be achieved in combination with

  1. reducing the full refresh frequency (e.g., to monthly)
  2. reducing the frequency of general Location refresh operations by the processing loop

For (2.), it is not possible to reduce the frequency only for Bitbucket Cloud-related Locations though.

Further optimizations might be required to resolve the rate limit issue.

Relates-to: #10866

feat(events/github): add GithubEventRouter

Add an event router for GitHub which handles events from the topic github and re-publishes events under their more specific topic based on the x-github-event metadata like e.g., github.push.

feat(events/gitlab): add GitLabEventRouter

Add an event router for GitLab which handles events from the topic gitlab and re-publishes events under their more specific topic based on the $.event_name payload field like e.g., gitlab.push.

feat(events/azure): add AzureDevOpsEventRouter

Add an event router for Azure DevOps which handles events from the topic azureDevOps and re-publishes events under their more specific topic based on the $.eventType payload field like e.g., azureDevOps.git.push.

feat(events/gerrit): add GerritEventRouter

Add an event router for Gerrit which handles events from the topic gerrit and re-publishes events under their more specific topic based on the $.type payload field like e.g., gerrit.change-merged.

Hey, I just made a Pull Request!

This PR will introduce event management without any persistence or distribution inside of the cluster through the event broker. It can be used for reacting to webhook events by SCM providers like GitHub or Bitbucket Cloud (included), however it is not limited to these use cases. All parts can be extended (or replaced in case of the event broker) using additional modules to customize it for your needs.

Via the http and sqs modules, you have two available options on how to receive events from the outside (http: HTTP POST requests to designated endpoints, sqs: as messages through AWS SQS queues). Further options for event publishers can be added (internal or wrapping external sources).

:heavy_check_mark: Checklist

github-actions[bot] commented 1 year ago

Changed Packages

Package Name Package Path Changeset Bump Current Version
example-backend packages/backend none v0.2.77-next.2
@backstage/plugin-bitbucket-cloud-common plugins/bitbucket-cloud-common patch v0.2.1-next.0
@backstage/plugin-catalog-backend-module-bitbucket-cloud plugins/catalog-backend-module-bitbucket-cloud patch v0.1.5-next.1
@backstage/plugin-events-backend-module-aws-sqs plugins/events-backend-module-aws-sqs minor v0.0.0
@backstage/plugin-events-backend-module-azure plugins/events-backend-module-azure minor v0.0.0
@backstage/plugin-events-backend-module-bitbucket-cloud plugins/events-backend-module-bitbucket-cloud minor v0.0.0
@backstage/plugin-events-backend-module-gerrit plugins/events-backend-module-gerrit minor v0.0.0
@backstage/plugin-events-backend-module-github plugins/events-backend-module-github minor v0.0.0
@backstage/plugin-events-backend-module-gitlab plugins/events-backend-module-gitlab minor v0.0.0
@backstage/plugin-events-backend-test-utils plugins/events-backend-test-utils minor v0.0.0
@backstage/plugin-events-backend plugins/events-backend minor v0.0.0
@backstage/plugin-events-node plugins/events-node minor v0.0.0
pjungermann commented 1 year ago

Maybe I should rename @backstage/plugin-events-backend-module-sqs to @backstage/plugin-events-backend-module-aws[-]sqs ๐Ÿค”

pjungermann commented 1 year ago

I think I will move the event bus implementation to the events-node module so that events-backend-modules can access it e.g. at tests.

regicsolutions commented 1 year ago

Great work on this @pjungermann Wondering how this defers from doing something like whatโ€™s been documented by frontside in this tutorial from an SCM perspective:

https://frontside.com/blog/2022-05-03-backstage-entity-provider/

Have been looking for an event driven solution for Bitbucket Server, where I am able to capture the delta of whatโ€™s been changed to the entity and was going to give that tutorial a try but your solution seems a lot more elegant ๐Ÿ˜€ wondering if you are capturing the deltas in Bitbucket Cloud?

pjungermann commented 1 year ago

@regicsolutions thanks, I think I read this article when Taras shared it on Discord.

Overall, this PR is based on the RFC #11082 and discussions on Discord.

The solution described is possible with this setup, too. Even though it still triggers a full refresh/mutation.

feat(events,catalog/bitbucketCloud): handle repo:push events

This handles deltas for Bitbucket Cloud. And you can decide for schedules for full refresh, too. (e.g., once a day/week/month/quarter/year/...).

You can decide whether you want your SCM provider/system to push events directly to an HTTP endpoint or pull them from an AWS SQS queue (where received webhook events are put). These are the included options so far; I plan to use AWS SQS for our org. You can add your own solution (contributed or private to your setup), too.

All similar webhook events by the different SCM providers like GitHub, GitLab, Azure DevOps, etc. differ a bit. At Bitbucket Cloud, we only have references to the commits (potentially truncated) and no information about what changed exactly. As the goal is to reduce API calls, I kept it simple. On GitHub's events, you get some additional information which can even help to ignore certain repo:push events (e.g., if there is no change to the catalog file).

The Bitbucket Cloud implementation was my original motivation to start the generic setup and could have been moved to a separate PR, however I thought it makes sense to include at least one subscriber implementation.

Whenever this PR is accepted and merged, you can add the bits for Bitbucket Server. Not sure how its events differ from Bitbucket Cloud. Might be pretty similar.

I guess implementations for GitHub, etc. will follow, too.

PS: Let me know if you see anything missing.

regicsolutions commented 1 year ago

Makes sense, appreciate the detailed response. Looking forward to it! ๐ŸŽ‰

x3ro commented 1 year ago

Hey folks. Thanks for this the great work @pjungermann! I've tried it out, and there are two things that I'm wondering about:

A quick draft of what I mean:

 private addRouteForTopic(
    router: express.Router,
    topic: string,
    receiveHook?: (request: express.Request) => boolean
  ): void {
    const path = `/${topic}`;

    router.post(path, async (request, response) => {
      const eventPayload = request.body;

      if(receiveHook && !receiveHook(request)) {
        response.json({ status: 'rejected' });
      } else {
        await this.eventBus!.publish(topic, eventPayload);
        response.json({ status: 'ok' });
      }
    });

    this.logger.info(
      `Registered /api/events/module/http${path} to receive events`,
    );
  }
pjungermann commented 1 year ago

While an EventSubscriber can subscribe to many topics (array in supportsEventTopics), the information on which topic an event came from is not provided to onEvent. I think this would be a useful addition, for example when one wants to consume multiple different webhooks with the same subscriber.

Good point. I can add the topic to the signature.

pjungermann commented 1 year ago

From a security perspective, I'm wondering if we could have some sort of validation mechanism for the incoming webhooks. A post-receive hook that gets called with the request after the payload is received, but before it is published to the event bus.

So far I expected consumers to validate / filter events if needed (e.g., sometimes there are sub-types which cannot be extracted in a generic way or using more specialized topics (e.g. bitbucketCloud vs bitbucketCloud/repo:push). Data is passed as unknown.

We could add another extension point for this validation. Or we could expect the EventPublisher to take care of this -- and for those getting them from external, they could either already implement it or allow for custom logic to be passed.

What do you think @x3ro ?

x3ro commented 1 year ago

I'm not sure I fully understand your comment.

So far I expected consumers to validate / filter events if needed [...]

This works if the information necessary for validation is included in the payload. In case of the Github Webhook signature, it's passed as a header, so this approach would not work.

What I wrote is essentially only relevant for potentially untrusted data I get from outside. For example, if I create a topic for github webhooks, how do I make sure that the data that I receive on that endpoint is actually coming from Github.

Or we could expect the EventPublisher to take care of this -- and for those getting them from external, they could either already implement it or allow for custom logic to be passed.

This seems roughly what I have in mind.

In essence, I'm wondering if the HttpEndpointEventPublisher can be made customizable enough to allow for untrusted input to be validated before being published as an event. If merged in its current form, we would have to write our own version of the http endpoint publisher for that.

I also just realized that, in my example above, I modified a private method, so that would of course not work ๐Ÿ˜…

pjungermann commented 1 year ago

@x3ro I've added both. Let me know if this works for you.

pjungermann commented 1 year ago

@freben Would be great to get another review round from you

freben commented 1 year ago

Yahp I'll give it another look soon, let's go after the release anyway

frenchbread commented 1 year ago

Hey, @pjungermann, great work!

I would also like to suggest a little addition to onEvent callback function. Since the request payload has different type/structure for almost every event, why not adding a third property to the onEvent callback function - event: string (aka request.headers['x-github-event'];) or just going ahead and pass the entire Request object instead of just payload? That would help differentiate events in relation to payloads. For example:

const subscribers = [{
  supportsEventTopics: () => ['github'],
  onEvent: async (topic: string, event: string, payload: any) => {
    if (event === 'repository' || event === 'push') {
      // apply an update
    }
  }
}];

// or

const subscribers = [{
  supportsEventTopics: () => ['github'],
  onEvent: async (topic: string, request: Request) => {
    const event = request.headers['x-github-event'];

    if (event === 'repository' || event === 'push') {
      // apply an update
    }
  }
}];

update:

Since header name is provider-specific, probably would be better to pass request context as the third parameter. For example:

const subscribers = [{
  supportsEventTopics: () => ['github'],
  onEvent: async (topic: string, payload: any, ctx: Request) => {
    const event = ctx.get('x-github-event'); // or ctx.headers['x-github-event']

    if (event === 'repository' || event === 'push') {
      // apply an update
    }
  }
}];
pjungermann commented 1 year ago

@frenchbread Thank you for your comment and ideas!

In general, I would not like to add Request to the subscriber onEvent signature as it does not necessarily originate from an HTTP request; see the SQS-based publisher compared to the HTTP-based publisher. event seems tricky to provide.

Also, the implementation is supposed to be generic and not only for webhooks as provided by the SCM systems like GitHub, GitLab, Bitbucket Cloud, etc.

I understand and share your general intent though to be able to get access to the event type. In best case to even subscribe only to a certain type of webhook event.

Before, I was wondering whether we could add this to the topic so that you can subscribe to specific webhook event types like github/repository or github/push instead of just github or bitbucketCloud/repo:push, etc. instead of just bitbucketCloud.

With the current setup, we could already use such more specific topics, however this would result in separate endpoints for each type of webhook event which means you would need to add separate subscriptions at the SCM provider, too. And these might have limits on the amount of subscriptions.

We could extract the information as long as it is provided in some form, but as they provide this in different ways it is a bit challenging to do this in a generic/agnostic way.

GitHub, GitLab and AzureDevOps support webhook secrets using the headers X-Hub-Signature and X-Hub-Signature-256 (when used with a secret). This was brought up by @x3ro above and could be solved using the added validator support.

We could add metadata?: Record<string, string> to the signature as key-value object which contains the headers for the case of the HTTP-based publisher or message attributes from the SQS messages at the SQS-based publisher. ๐Ÿค”

You could also think about whether we need generic support instead of adding something like a HttpGithubWebhookPublisher, etc.

In case of support HTTP/SQS/... as options, it may make this a bit more complicated.

We could consider having something like:

HttpEndpointEventPublisher publishes an event for github and then we have a subscriber (and publisher) which consumes the events and knows that there will be a x-github-event entry at the metadata and uses this to publish github/push which then can be consumed by subscribers interested in that only. ๐Ÿค”

Another option might be to allow to extract the sub-topic/event type from a given payload property or header (Azure DevOps: property $.eventType, Bitbucket Cloud: header x-event-key, GitHub: header x-github-event, GitLab: property $.event_name) which could be configured in a generic way via app-config. Not sure if this is a great approach though.

Just some ad-hoc thoughts. Let me know what you think.

pjungermann commented 1 year ago

@freben thanks for the comments. I will go though them in detail soon.

As you were just on it: Do you have any thoughts on the discussion above?

Currently, I consider

HttpEndpointEventPublisher publishes an event for github and then we have a subscriber (and publisher) which consumes the events and knows that there will be a x-github-event entry at the metadata and uses this to publish github/push which then can be consumed by subscribers interested in that only. ๐Ÿค”

freben commented 1 year ago

let me get back on that topic :) getting late

pjungermann commented 1 year ago

We could add metadata?: Record<string, string> to the signature as key-value object which contains the headers for the case of the HTTP-based publisher or message attributes from the SQS messages at the SQS-based publisher. ๐Ÿค”

I have added metadata support for now which will be filled with headers (http) or message attributes (sqs).

E.g., used like

    if (params.metadata?.['x-event-key'] === 'repo:push') {
      await this.onRepoPush(params.eventPayload as Events.RepoPushEvent);
    }
pjungermann commented 1 year ago

I looked into the failed test (e2e test on Windows; failed at yarn install after creating a new app). Locally on Mac it worked fine though.

I assume it is unrelated to this change.

frenchbread commented 1 year ago

@pjungermann thanks for your thoughts and implementation!

Before, I was wondering whether we could add this to the topic so that you can subscribe to specific webhook event types like github/repository or github/push instead of just github or bitbucketCloud/repo:push, etc. instead of just bitbucketCloud.

With the current setup, we could already use such more specific topics, however this would result in separate endpoints for each type of webhook event which means you would need to add separate subscriptions at the SCM provider, too. And these might have limits on the amount of subscriptions.

I liked your idea of topic-route approach, but as you mentioned it would require adding multiple (or a lot) of webhooks which is not easily manageable solution.

We could add metadata?: Record<string, string> to the signature as key-value object which contains the headers for the case of the HTTP-based publisher or message attributes from the SQS messages at the SQS-based publisher.

HttpEndpointEventPublisher publishes an event for github and then we have a subscriber (and publisher) which consumes the events and knows that there will be a x-github-event entry at the metadata and uses this to publish github/push which then can be consumed by subscribers interested in that only. ๐Ÿค”

Both of these two options sound good to me. The metadata based approach basically extends what I've suggested. Thanks for adding it!

pjungermann commented 1 year ago

@frenchbread the metadata part is already integrated here (incl. support at http and sqs transports) and I worked on event router implementations to make it easier for subscribers. I will push these soon as well.

As you noted, I also decided against the subscription per webhook event type option -- even though it can be used if an org decides for that (e.g., if you are only interested in one or a few of these).

pjungermann commented 1 year ago

@frenchbread I have added the event routers, too.

What I didn't implement is the signature verification:

GitHub, GitLab and AzureDevOps support webhook secrets using the headers X-Hub-Signature and X-Hub-Signature-256 (when used with a secret).

frenchbread commented 1 year ago

@pjungermann That's great!

What I didn't implement is the signature verification

Right, I'm thinking that verification can be implemented separately for each entity provider. E.g. for github it's implemented within octokit/methods. Not sure about gitlab and bitbucket.

pjungermann commented 1 year ago

Right, I'm thinking that verification can be implemented separately for each entity provider.

I think there are various options. E.g., it could be done using a request validator at the http transport support (module "http") which will receive HTTP requests from GitHub for the topic github.

No other subscriber (like entity provider, ...) would require to do it on their own and could rely on verified events.

On the other hand, we have other transports like using SQS. I didn't add any validator support there as I assume that this can be managed externally and all messages are "verified".

Another option could be to do it at the event router. This would mean that events for sub-topics will be verified, however other subscriber to the general topic (e.g., github) would not be verified.

Or we could do it at each subscriber. That seems to be the most defensive option. It might cause the same event to be verified multiple times though which seems unnecessary.

Not sure about gitlab and bitbucket.

Bitbucket Cloud does not support this feature, however Azure DevOps is based on what I read.

They all seem to use the x-hub-signature-256 with HMAC hex digest. An implementation is potentially reusable across all of these three.

pjungermann commented 1 year ago

Another option for signature verification:

We could use the verify option at the body parser

express.json({ verify: (req, res, buf, encoding => {
  // ...
});

This would be a rather generic implementation though applied to all incoming requests. Of course, if there is no such header, it would pass.

pjungermann commented 1 year ago

https://docs.gitlab.com/ee/user/project/integrations/webhooks.html#validate-payloads-by-using-a-secret-token

GitLab uses X-Gitlab-Token which is just passing the secret token which was defined at GitLab / at the webhook subscription. It is not using HMAC hex digests with the payload and secret.

pjungermann commented 1 year ago

I've implemented the signature/token verification for GitHub and GitLab, too, however I think it should be moved to a separate PR.

pjungermann commented 1 year ago

For those interested in signature verification (current draft):

Not yet sure whether it makes sense to keep it at these modules due to the required dependencies or wether it should be moved to separate ones.

However, as I wrote I will keep this for another PR.

pjungermann commented 1 year ago

FYI: as discussed, due to absences at the maintainers and the need for discussing this properly, the topic will be on hold until end of next week.

pjungermann commented 1 year ago

Current failure is unrelated to the changes and I will attempt a retry re-uploading the last commit

FAIL plugins/catalog-backend/src/service/DefaultRefreshService.test.ts (68.141 s)
  โ— Refresh integration โ€บ should refresh the parent location, "POSTGRES_13"
pjungermann commented 1 year ago

@freben I've pushed all changes as discussed on Discord. "http" module is now part of "events-backend" and "events-node".

regicsolutions commented 1 year ago

@pjungermann is it possible to add bitbucketServer support?

pjungermann commented 1 year ago

@pjungermann is it possible to add bitbucketServer support?

so far, there is only event-based entity updates / ingestion for Bitbucket Cloud and event router implementations for others excl. Bitbucket Server.

You can add this for Bitbucket Server as soon as it was merged though.

Personally, I will not add support for it as we don't use it and I have no way to test it anyways. Also, I have no documentation about webhooks there, so didn't add the event router either. Feel free to contribute.

awanlin commented 1 year ago

Can't wait for the next release!!! Awesome work @pjungermann!!!

pjungermann commented 1 year ago

@awanlin it made it into 1.8.0 last Tuesday :-)