Introduce Multiplexing Transport

AbhiPrasad commented 2 years ago

Problem Statement

Usage of multiple hubs/clients has been a long standing issue within the SDK. https://github.com/getsentry/sentry-javascript/issues?q=is%3Aopen+is%3Aissue+label%3A%22Feature%3A+Multiple+hubs%2Fclients%22

A big use case for using multi-hub is to send Sentry event's to different projects. By introducing an optional transport multiplexer, we can allow users to route envelopes to different DSNs (or the same envelope to multiple DSNs).

This means users can route events to different Sentry projects based on any characteristics of an event - like tags, stacktrace or contexts

Solution Brainstorm

Most transports constructors right now (makeFetchTransport, makeXHRTransport) are a function that takes some transport options and returns a transport.

transport: (options: TransportOptions) => Transport

We could then make a wrapper class typed like so:

type DsnToMatcher = {
  // TODO: Adjust matcher function type based on UX needs
  [dsn: string]: (type: string, envelopeItemBody: unknown, envelopeItem: BaseEnvelopeItem) => boolean;
}

type TransportCreator = (options: TransportOptions) => Transport
type TransportMultiplexer = (opts: DsnToMatcher, wrappedTransport: TransportCreator) => TransportCreator;

It's just a wrapper func, so it would be used like so:

Sentry.init({
  dsn: ...,
  transport: multiplexTransport(makeFetchTransport),
});

There are a couple open questions here:

Do we make this errors/transaction only? What do we do about sessions, attachments and other envelope items
How do we easily design the API so that it can we can route to different DSNs and send envelopes to multiple DSNs at the same time.
Do we expose entire envelope items? Or just the body.

There is an argument that this needs can just be a code snippet in docs, instead of exporting something, but it would be nice to have this functionality out of the box to address some of the MFE concerns.

lobsterkatie commented 2 years ago

I think this is a good idea! Our current workaround (using beforeSend to do the routing) only works for errors, and it would be nice to have an easy way for folks to make this work.

A few thoughts:

Do we make this errors/transaction only? What do we do about sessions, attachments and other envelope items

I think it only makes sense if we do it for everything, otherwise you still have one project's data getting stored in another project. (This does beg the question of what to do when there are multiple items in an envelope, of course, but I think we can figure that out.)

How do we easily design the API so that it can we can route to different DSNs and send envelopes to multiple DSNs at the same time.

The matcher that you're envisioning - is the idea that for each DSN, we'd run the corresponding matcher, and if it returned true, we'd send to that DSN, and if it returned false, we wouldn't?

It feels much more intuitive to me (and is probably less code for users) to have people write one function (rather than one per DSN) and have it return the DSNs which should receive the event/session/etc. (We can figure out the easiest way to handle default behavior, if users want that for most stuff by not everything.)

Do we expose entire envelope items? Or just the body.

I def think just the body. There's nothing in the headers people need, IMHO, and besides, envelopes are an implementation detail that I'd rather not tie us to with public API.

yordis commented 2 years ago

type DsnToMatcher = { // TODO: Adjust matcher function type based on UX needs [dsn: string]: (type: string, envelopeItemBody: unknown, envelopeItem: BaseEnvelopeItem) => boolean; }

Something that I wish happens moving into this architecture is that the DSN is a configuration of the transporter instead of the Hub since it feels that is where it suppose to be.

Obviously, this will introduce some concerns about wiring things up (I think).

Back to the feature,

One major question,

Normally in these middleware/pipelines, the message will have 2 sections (data, and metadata), where the data is the main information about the Message payload, and the metadata is where pipelines will use to do things.

In this particular case, do you have such a thing?

I def think just the body. There's nothing in the headers people need, IMHO, and besides, envelopes are an implementation detail that I'd rather not tie us to with public API.

It seems that you kind of has some separations.

So maybe leveraging something like scope or whatever that could add some metadata could solve the issue.

Making a transporter just one step in the pipeline that doesn't call next (like express).

// making things up to illustrate how it could feel
Sentry.captureException(error, { metadata: { multiplexing: ['microfrontend-1'] });

Instead of being coupled to DNS, the user decide however they want to do the multiplexing into some transporter, including using DNS if that is what they want.

The transporter could look like the following

Sentry.init({
  transport: multiplexTransport({
    'microfrontend-1': makeFetchTransport({ dsn: '' }),
  }),
});

This will make the Hub just a collector of Messages, but it is not aware of what happens with the messages, eventually sending them to WebWorker, or Fetch, or a combination of them.

lobsterkatie commented 2 years ago

So maybe leveraging something like scope or whatever that could add some metadata could solve the issue.

Making a transporter just one step in the pipeline that doesn't call next (like express).
// making things up to illustrate how it could feel
Sentry.captureException(error, { metadata: { multiplexing: ['microfrontend-1'] });
Instead of being coupled to DNS, the user decide however they want to do the multiplexing into some transporter, including using DNS if that is what they want.

The transporter could look like the following
Sentry.init({
  transport: multiplexTransport({
    'microfrontend-1': makeFetchTransport({ dsn: '' }),
  }),
});
This will make the Hub just a collector of Messages, but it is not aware of what happens with the messages, eventually sending them to WebWorker, or Fetch, or a combination of them.

Yeah, something like this, except that I think having individual transports adds another layer of complication beyond what we're talking about here, which at least for now is simply, "I want to use a single transport, except that I want to be able to choose the destination for each event/session/etc that transport handles rather than having everything sent to a single place." For the vast majority of users, that single transport will be the default one, and so I'd rather only make them think about the routing piece.

yordis commented 2 years ago

Sentry.init({
  transport: multiplexTransport({
    'microfrontend-1': { dsn: '' },
  }),
});

Then that?

From my perspective, having a more scalable solution at the cost of an extra complexity will be worth it, otherwise, breaking changes will continue to happen due to limitations from the architectural perspective.

For example,

What if one of the services has rate-limiting while the other doesn't?

What if one of the services needs some special header to bypass some reverse proxy for whatever reason why the other one doesn't?

At that point, the transport itself must change, not just the DSN.

And as I mentioned before, Sentry setup is a one-time configuration problem, so good documentation, clear directions, and whatnot compensate for flexible architectures that require a bit of wiring.

Also, you can always give me the default wiring.

Thoughts?

lobsterkatie commented 2 years ago

Sentry.init({
 transport: multiplexTransport({
   'microfrontend-1': { dsn: '' },
 }),
});

Then that?

I was thinking something in the middle in terms of complexity:

const dsns = {
  things: '...',
  stuff: '...',
  default: '...',
};

Sentry.init({
  transport: multiplexTransport({
    transport: someTransport, // would be optional so that the default http or fetch transport could be used
    getDSNs: (event, hint) => {
      if (event.tags.component === 'something') {
        return [dsns.stuff];
      }

      if (hint.originalException.target === 'somethingElse') {
        return [dsns.things, dsns.stuff];
      }

      return [dsns.default];
    },
  }),
});

Those are all valid use cases, but also probably at most the 5% of the 5% who would be using any version of this feature in the first place. I hear you about breaking changes, but I also think there's real danger in having too many levers, however well documented. (Not that we're anywhere close yet, but webpack is a great example of this. Good docs, still pretty hard to set up.) Also, the point of this feature is really ease of use, and IMHO it's definitely easier just to provide a single function than multiple transports which in most cases will differ only by destination.

My proposal would be to implement

type MultiplexTransportOptions = {
  transport?: Transport,
  getDSNs: (event: Event, hint: EventHint) => string[]
}

function multiplexTransport(options: MultiplexTransportOptions): Transport {...}

first, and then if there's clamor for something more flexible, we consider changing it to

type MultiplexTransportOptions = {
  transport?: Transport;
  getDSNs?: (event: Event, hint: EventHint) => string[];
  getTransport?: (event: Event, hint: EventHint) => Transport
};

and then people could provide either getDSNs alone (to use the default transport), getDSNs and transport (to use the same custom transport going to multiple places), or getTransport (to be able to control everything). That way it's not actually breaking and we can do it whenever we want (if we decide it's a good idea).

In the meantime, anyone with that specific a use case wouldn't be out of luck - they could just make their own custom transport (without the multiplexer) - they'd just be lacking the sugar that the folks with the more common use case would be getting.

yordis commented 2 years ago

I like your idea to use tags and let the user do whatever they want

getsentry / sentry-javascript

Introduce Multiplexing Transport #5185

Problem Statement

Solution Brainstorm