Closed pjungermann closed 1 year ago
Package Name | Package Path | Changeset Bump | Current Version |
---|---|---|---|
example-backend | packages/backend | none | v0.2.77-next.2 |
@backstage/plugin-bitbucket-cloud-common | plugins/bitbucket-cloud-common | patch | v0.2.1-next.0 |
@backstage/plugin-catalog-backend-module-bitbucket-cloud | plugins/catalog-backend-module-bitbucket-cloud | patch | v0.1.5-next.1 |
@backstage/plugin-events-backend-module-aws-sqs | plugins/events-backend-module-aws-sqs | minor | v0.0.0 |
@backstage/plugin-events-backend-module-azure | plugins/events-backend-module-azure | minor | v0.0.0 |
@backstage/plugin-events-backend-module-bitbucket-cloud | plugins/events-backend-module-bitbucket-cloud | minor | v0.0.0 |
@backstage/plugin-events-backend-module-gerrit | plugins/events-backend-module-gerrit | minor | v0.0.0 |
@backstage/plugin-events-backend-module-github | plugins/events-backend-module-github | minor | v0.0.0 |
@backstage/plugin-events-backend-module-gitlab | plugins/events-backend-module-gitlab | minor | v0.0.0 |
@backstage/plugin-events-backend-test-utils | plugins/events-backend-test-utils | minor | v0.0.0 |
@backstage/plugin-events-backend | plugins/events-backend | minor | v0.0.0 |
@backstage/plugin-events-node | plugins/events-node | minor | v0.0.0 |
Maybe I should rename @backstage/plugin-events-backend-module-sqs
to @backstage/plugin-events-backend-module-aws[-]sqs
๐ค
I think I will move the event bus implementation to the events-node module so that events-backend-modules can access it e.g. at tests.
Great work on this @pjungermann Wondering how this defers from doing something like whatโs been documented by frontside in this tutorial from an SCM perspective:
https://frontside.com/blog/2022-05-03-backstage-entity-provider/
Have been looking for an event driven solution for Bitbucket Server, where I am able to capture the delta of whatโs been changed to the entity and was going to give that tutorial a try but your solution seems a lot more elegant ๐ wondering if you are capturing the deltas in Bitbucket Cloud?
@regicsolutions thanks, I think I read this article when Taras shared it on Discord.
Overall, this PR is based on the RFC #11082 and discussions on Discord.
The solution described is possible with this setup, too. Even though it still triggers a full refresh/mutation.
feat(events,catalog/bitbucketCloud): handle repo:push events
This handles deltas for Bitbucket Cloud. And you can decide for schedules for full refresh, too. (e.g., once a day/week/month/quarter/year/...).
You can decide whether you want your SCM provider/system to push events directly to an HTTP endpoint or pull them from an AWS SQS queue (where received webhook events are put). These are the included options so far; I plan to use AWS SQS for our org. You can add your own solution (contributed or private to your setup), too.
All similar webhook events by the different SCM providers like GitHub, GitLab, Azure DevOps, etc. differ a bit. At Bitbucket Cloud, we only have references to the commits (potentially truncated) and no information about what changed exactly. As the goal is to reduce API calls, I kept it simple. On GitHub's events, you get some additional information which can even help to ignore certain repo:push events (e.g., if there is no change to the catalog file).
The Bitbucket Cloud implementation was my original motivation to start the generic setup and could have been moved to a separate PR, however I thought it makes sense to include at least one subscriber implementation.
Whenever this PR is accepted and merged, you can add the bits for Bitbucket Server. Not sure how its events differ from Bitbucket Cloud. Might be pretty similar.
I guess implementations for GitHub, etc. will follow, too.
PS: Let me know if you see anything missing.
Makes sense, appreciate the detailed response. Looking forward to it! ๐
Hey folks. Thanks for this the great work @pjungermann! I've tried it out, and there are two things that I'm wondering about:
While an EventSubscriber
can subscribe to many topics (array in supportsEventTopics
), the information on which topic an event came from is not provided to onEvent
. I think this would be a useful addition, for example when one wants to consume multiple different webhooks with the same subscriber.
From a security perspective, I'm wondering if we could have some sort of validation mechanism for the incoming webhooks. A post-receive hook that gets called with the request after the payload is received, but before it is published to the event bus.
A quick draft of what I mean:
private addRouteForTopic(
router: express.Router,
topic: string,
receiveHook?: (request: express.Request) => boolean
): void {
const path = `/${topic}`;
router.post(path, async (request, response) => {
const eventPayload = request.body;
if(receiveHook && !receiveHook(request)) {
response.json({ status: 'rejected' });
} else {
await this.eventBus!.publish(topic, eventPayload);
response.json({ status: 'ok' });
}
});
this.logger.info(
`Registered /api/events/module/http${path} to receive events`,
);
}
While an EventSubscriber can subscribe to many topics (array in supportsEventTopics), the information on which topic an event came from is not provided to onEvent. I think this would be a useful addition, for example when one wants to consume multiple different webhooks with the same subscriber.
Good point. I can add the topic to the signature.
From a security perspective, I'm wondering if we could have some sort of validation mechanism for the incoming webhooks. A post-receive hook that gets called with the request after the payload is received, but before it is published to the event bus.
So far I expected consumers to validate / filter events if needed (e.g., sometimes there are sub-types which cannot be extracted in a generic way or using more specialized topics (e.g. bitbucketCloud
vs bitbucketCloud/repo:push
).
Data is passed as unknown
.
We could add another extension point for this validation. Or we could expect the EventPublisher to take care of this -- and for those getting them from external, they could either already implement it or allow for custom logic to be passed.
What do you think @x3ro ?
I'm not sure I fully understand your comment.
So far I expected consumers to validate / filter events if needed [...]
This works if the information necessary for validation is included in the payload. In case of the Github Webhook signature, it's passed as a header, so this approach would not work.
What I wrote is essentially only relevant for potentially untrusted data I get from outside. For example, if I create a topic for github webhooks, how do I make sure that the data that I receive on that endpoint is actually coming from Github.
Or we could expect the EventPublisher to take care of this -- and for those getting them from external, they could either already implement it or allow for custom logic to be passed.
This seems roughly what I have in mind.
In essence, I'm wondering if the HttpEndpointEventPublisher
can be made customizable enough to allow for untrusted input to be validated before being published as an event. If merged in its current form, we would have to write our own version of the http endpoint publisher for that.
I also just realized that, in my example above, I modified a private method, so that would of course not work ๐
@x3ro I've added both. Let me know if this works for you.
@freben Would be great to get another review round from you
Yahp I'll give it another look soon, let's go after the release anyway
Hey, @pjungermann, great work!
I would also like to suggest a little addition to onEvent
callback function. Since the request payload has different type/structure for almost every event, why not adding a third property to the onEvent
callback function - event: string
(aka request.headers['x-github-event'];
) or just going ahead and pass the entire Request
object instead of just payload
? That would help differentiate events in relation to payloads. For example:
const subscribers = [{
supportsEventTopics: () => ['github'],
onEvent: async (topic: string, event: string, payload: any) => {
if (event === 'repository' || event === 'push') {
// apply an update
}
}
}];
// or
const subscribers = [{
supportsEventTopics: () => ['github'],
onEvent: async (topic: string, request: Request) => {
const event = request.headers['x-github-event'];
if (event === 'repository' || event === 'push') {
// apply an update
}
}
}];
Since header
name is provider-specific, probably would be better to pass request context as the third parameter. For example:
const subscribers = [{
supportsEventTopics: () => ['github'],
onEvent: async (topic: string, payload: any, ctx: Request) => {
const event = ctx.get('x-github-event'); // or ctx.headers['x-github-event']
if (event === 'repository' || event === 'push') {
// apply an update
}
}
}];
@frenchbread Thank you for your comment and ideas!
In general, I would not like to add Request
to the subscriber onEvent
signature as it does not necessarily originate from an HTTP request; see the SQS-based publisher compared to the HTTP-based publisher. event
seems tricky to provide.
Also, the implementation is supposed to be generic and not only for webhooks as provided by the SCM systems like GitHub, GitLab, Bitbucket Cloud, etc.
I understand and share your general intent though to be able to get access to the event type. In best case to even subscribe only to a certain type of webhook event.
Before, I was wondering whether we could add this to the topic so that you can subscribe to specific webhook event types like github/repository
or github/push
instead of just github
or bitbucketCloud/repo:push
, etc. instead of just bitbucketCloud
.
With the current setup, we could already use such more specific topics, however this would result in separate endpoints for each type of webhook event which means you would need to add separate subscriptions at the SCM provider, too. And these might have limits on the amount of subscriptions.
We could extract the information as long as it is provided in some form, but as they provide this in different ways it is a bit challenging to do this in a generic/agnostic way.
X-GitHub-Event
.X-Event-Key
.X-Gitlab-Event
(e.g. "Push Hook"
) and $.event_name
(e.g., "push
") at the payload.$.eventType
in the payload (e.g., "git.push"
).GitHub, GitLab and AzureDevOps support webhook secrets using the headers X-Hub-Signature
and X-Hub-Signature-256
(when used with a secret).
This was brought up by @x3ro above and could be solved using the added validator support.
We could add metadata?: Record<string, string>
to the signature as key-value object which contains the headers for the case of the HTTP-based publisher or message attributes from the SQS messages at the SQS-based publisher. ๐ค
You could also think about whether we need generic support instead of adding something like a HttpGithubWebhookPublisher
, etc.
In case of support HTTP/SQS/... as options, it may make this a bit more complicated.
We could consider having something like:
HttpEndpointEventPublisher
publishes an event for github
and then we have a subscriber (and publisher) which consumes the events and knows that there will be a x-github-event
entry at the metadata and uses this to publish github/push
which then can be consumed by subscribers interested in that only. ๐ค
Another option might be to allow to extract the sub-topic/event type from a given payload property or header (Azure DevOps: property $.eventType
, Bitbucket Cloud: header x-event-key
, GitHub: header x-github-event
, GitLab: property $.event_name
) which could be configured in a generic way via app-config. Not sure if this is a great approach though.
Just some ad-hoc thoughts. Let me know what you think.
@freben thanks for the comments. I will go though them in detail soon.
As you were just on it: Do you have any thoughts on the discussion above?
Currently, I consider
HttpEndpointEventPublisher
publishes an event forgithub
and then we have a subscriber (and publisher) which consumes the events and knows that there will be ax-github-event
entry at the metadata and uses this to publishgithub/push
which then can be consumed by subscribers interested in that only. ๐ค
let me get back on that topic :) getting late
We could add
metadata?: Record<string, string>
to the signature as key-value object which contains the headers for the case of the HTTP-based publisher or message attributes from the SQS messages at the SQS-based publisher. ๐ค
I have added metadata
support for now which will be filled with headers (http) or message attributes (sqs).
E.g., used like
if (params.metadata?.['x-event-key'] === 'repo:push') {
await this.onRepoPush(params.eventPayload as Events.RepoPushEvent);
}
I looked into the failed test (e2e test on Windows; failed at yarn install
after creating a new app). Locally on Mac it worked fine though.
I assume it is unrelated to this change.
@pjungermann thanks for your thoughts and implementation!
Before, I was wondering whether we could add this to the topic so that you can subscribe to specific webhook event types like github/repository or github/push instead of just github or bitbucketCloud/repo:push, etc. instead of just bitbucketCloud.
With the current setup, we could already use such more specific topics, however this would result in separate endpoints for each type of webhook event which means you would need to add separate subscriptions at the SCM provider, too. And these might have limits on the amount of subscriptions.
I liked your idea of topic-route
approach, but as you mentioned it would require adding multiple (or a lot) of webhooks which is not easily manageable solution.
We could add metadata?: Record<string, string> to the signature as key-value object which contains the headers for the case of the HTTP-based publisher or message attributes from the SQS messages at the SQS-based publisher.
HttpEndpointEventPublisher publishes an event for github and then we have a subscriber (and publisher) which consumes the events and knows that there will be a x-github-event entry at the metadata and uses this to publish github/push which then can be consumed by subscribers interested in that only. ๐ค
Both of these two options sound good to me. The metadata
based approach basically extends what I've suggested. Thanks for adding it!
@frenchbread the metadata part is already integrated here (incl. support at http and sqs transports) and I worked on event router implementations to make it easier for subscribers. I will push these soon as well.
As you noted, I also decided against the subscription per webhook event type option -- even though it can be used if an org decides for that (e.g., if you are only interested in one or a few of these).
@frenchbread I have added the event routers, too.
What I didn't implement is the signature verification:
GitHub, GitLab and AzureDevOps support webhook secrets using the headers X-Hub-Signature and X-Hub-Signature-256 (when used with a secret).
@pjungermann That's great!
What I didn't implement is the signature verification
Right, I'm thinking that verification can be implemented separately for each entity provider. E.g. for github
it's implemented within octokit/methods
. Not sure about gitlab
and bitbucket
.
Right, I'm thinking that verification can be implemented separately for each entity provider.
I think there are various options. E.g., it could be done using a request validator at the http transport support (module "http") which will receive HTTP requests from GitHub for the topic github
.
No other subscriber (like entity provider, ...) would require to do it on their own and could rely on verified events.
On the other hand, we have other transports like using SQS. I didn't add any validator support there as I assume that this can be managed externally and all messages are "verified".
Another option could be to do it at the event router. This would mean that events for sub-topics will be verified, however other subscriber to the general topic (e.g., github
) would not be verified.
Or we could do it at each subscriber. That seems to be the most defensive option. It might cause the same event to be verified multiple times though which seems unnecessary.
Not sure about gitlab and bitbucket.
Bitbucket Cloud does not support this feature, however Azure DevOps is based on what I read.
They all seem to use the x-hub-signature-256
with HMAC hex digest. An implementation is potentially reusable across all of these three.
Another option for signature verification:
We could use the verify
option at the body parser
express.json({ verify: (req, res, buf, encoding => {
// ...
});
This would be a rather generic implementation though applied to all incoming requests. Of course, if there is no such header, it would pass.
GitLab uses X-Gitlab-Token
which is just passing the secret token which was defined at GitLab / at the webhook subscription. It is not using HMAC hex digests with the payload and secret.
I've implemented the signature/token verification for GitHub and GitLab, too, however I think it should be moved to a separate PR.
FYI: as discussed, due to absences at the maintainers and the need for discussing this properly, the topic will be on hold until end of next week.
Current failure is unrelated to the changes and I will attempt a retry re-uploading the last commit
FAIL plugins/catalog-backend/src/service/DefaultRefreshService.test.ts (68.141 s)
โ Refresh integration โบ should refresh the parent location, "POSTGRES_13"
@freben I've pushed all changes as discussed on Discord. "http" module is now part of "events-backend" and "events-node".
@pjungermann is it possible to add bitbucketServer support?
@pjungermann is it possible to add bitbucketServer support?
so far, there is only event-based entity updates / ingestion for Bitbucket Cloud and event router implementations for others excl. Bitbucket Server.
You can add this for Bitbucket Server as soon as it was merged though.
Personally, I will not add support for it as we don't use it and I have no way to test it anyways. Also, I have no documentation about webhooks there, so didn't add the event router either. Feel free to contribute.
Can't wait for the next release!!! Awesome work @pjungermann!!!
@awanlin it made it into 1.8.0 last Tuesday :-)
feat(events): add events management capabilities
This change introduces some new plugins which provide the basics for managing events inside of backstage. Hereby, it offers extension points to add event publishers and subscribers as well as to exchange the event broker implementation.
@backstage/plugin-events-backend
: backend for the events management which connects all parts and provides a simple in-memory event broker@backstage/plugin-events-node
: interfaces and API for@backstage/plugin-events-backend
@backstage/plugin-events-test-utils
: test utilities like implementations useful for writing tests at modulesAll plugins support the new backend-plugin-api.
Relates-to: #11082
feat(events/http): add HTTP endpoint-based event publisher
This plugin adds an event publisher which receives events via (an) HTTP endpoint(s) and can be used as destination at webhook subscriptions.
Relates-to: #11082
feat(events,example): integrate at example backend
Integrate plugins-events-backend with plugins-events-backend-module-http at the example backend.
feat(events,example): add simple way to add event-based entity providers
Add
DemoEventBasedEntityProvider
as example implementation.feat(events/sqs): add a new AWS SQS event publisher
This change introduces a new plugin
@backstage/plugin-events-backend-module-sqs
.This plugin provides an event publisher which receives events from (an) AWS SQS queue(s) and publishes them to the event broker.
The plugin supports the new backend-plugin-api and connects with the other plugins.
feat(events/bitbucketCloud): add
BitbucketCloudEventRouter
Add an event router for Bitbucket Cloud which handles events from the topic
bitbucketCloud
and re-publishes events under their more specific topic based on thex-event-key
metadata like e.g.,bitbucketCloud.repo:push
.fix(catalog/bitbucketCloud): fix test file name
The file was forgotten to be adjusted as part of PR #13859.
Relates-to: PR #13859
feat(events,catalog/bitbucketCloud): handle repo:push events
Handle Bitbucket Cloud
repo:push
events at theBitbucketCloudEntityProvider
by subscribing to topicbitbucketCloud.repo:push
.Implements
EventSubscriber
to receive events for the topicbitbucketCloud.repo:push
.On
repo:push
, the affected repository will be refreshed. This includes adding new Location entities, refreshing existing ones, and removing obsolete ones.To support this, a new annotation
bitbucket.org/repo-url
was added to Location entities.A full refresh will require 1 API call to Bitbucket Cloud to discover all catalog files. When we handle one
repo:push
event, we also need 1 API call in order to know which catalog files exist. This may lead to more discovery-related API calls (code search). The main cause for hitting the rate limits are Locations refresh-related operations.A reduction of total API calls to reduce the rate limit issues can only be achieved in combination with
For (2.), it is not possible to reduce the frequency only for Bitbucket Cloud-related Locations though.
Further optimizations might be required to resolve the rate limit issue.
Relates-to: #10866
feat(events/github): add
GithubEventRouter
Add an event router for GitHub which handles events from the topic
github
and re-publishes events under their more specific topic based on thex-github-event
metadata like e.g.,github.push
.feat(events/gitlab): add
GitLabEventRouter
Add an event router for GitLab which handles events from the topic
gitlab
and re-publishes events under their more specific topic based on the$.event_name
payload field like e.g.,gitlab.push
.feat(events/azure): add
AzureDevOpsEventRouter
Add an event router for Azure DevOps which handles events from the topic
azureDevOps
and re-publishes events under their more specific topic based on the$.eventType
payload field like e.g.,azureDevOps.git.push
.feat(events/gerrit): add
GerritEventRouter
Add an event router for Gerrit which handles events from the topic
gerrit
and re-publishes events under their more specific topic based on the$.type
payload field like e.g.,gerrit.change-merged
.Hey, I just made a Pull Request!
This PR will introduce event management without any persistence or distribution inside of the cluster through the event broker. It can be used for reacting to webhook events by SCM providers like GitHub or Bitbucket Cloud (included), however it is not limited to these use cases. All parts can be extended (or replaced in case of the event broker) using additional modules to customize it for your needs.
Via the
http
andsqs
modules, you have two available options on how to receive events from the outside (http
: HTTP POST requests to designated endpoints,sqs
: as messages through AWS SQS queues). Further options for event publishers can be added (internal or wrapping external sources).:heavy_check_mark: Checklist
Signed-off-by
line in the message. (more info)