Revamp the Integrations API

AbhiPrasad commented 2 years ago

Problem Statement

Warning: This is a WIP brainstorming doc

Similar to work done in https://github.com/getsentry/sentry-javascript/issues/4660 for the transports, we can update things for the integrations also.

Solution Brainstorm

In this ideal world, we would like to pass in the hub directly to the integration.

Something like

interface NewIntegration {
  name: string;
  setup(hub: Hub): void;
}

The issue here is that this does not work in the node SDK, which relies on storing the hub on the domain. A new hub is constructed for every, but it re-uses the client, so the integrations will only be setup once. This means that they only get a reference to the original hub, not the new hub that comes from each domain.

https://github.com/getsentry/sentry-javascript/blob/37b14bf2273801fa40d4aea22ab925d7753a0a40/packages/hub/src/hub.ts#L601

We could have two different hub modes though. In the first hub mode, there is some async storage that stores the hub (domains, async hooks, zonejs etc.) - so getCurrentHub returns the hub thats stored in async storage. In the second hub mode, getCurrentHub just returns the hub that originally created the client + passed in integrations.

interface NewIntegration {
  name: string;
  // does different things based on environment.
  setup(getCurrentHub: () => Hub): void;
}

I recognize this is kind of confusing though, so would appreciate and suggestions for a better API here. Struggling to generate ideas beyond this.

One important thing to note is event processors. We'll need to add event processors to hubs to make this change happen. In addition, we can change the event processor API so that (optionally) we give it the name of the integration that created it. This allows for easier debugging and logging, so users can understand if an event processor dropped an event, why it happened.

type addEventProcessor = (processor: EventProcessor, name?: string) => void;

timfish commented 2 years ago

I need to take a proper look over the existing code before I take an opinion on the getCurrentHub() changes.

However two thumbs up for adding name to event processors. 👍👍 If I understand correctly, this isn't even a breaking change.

yordis commented 2 years ago

What is the usage of those name that I see in the integrations (and now) here @AbhiPrasad?

Are they for logging things out or enforcing uniqueness or what?

kamilogorek commented 2 years ago

@yordis filtering out default integrations - https://docs.sentry.io/platforms/javascript/configuration/integrations/default/#removing-an-integration

We could use instanceof but it was more obvious for the end users this way.

yordis commented 2 years ago

We could use instanceof but it was more obvious for the end users this way.

Or the need to have it would go away by introducing a different API: #4789

Which is where my mind went ... but I don't have the full context of it.

timfish commented 2 years ago

As well as helping remove integrations from the defaults, the integration name property is also used in a few places internally to check for an integration without having to reference it and have it included in the bundle.

We could have two different hub modes though...

I finally had a chance to look into this and I think it's is a nice approach since it makes getCurrentHub() do what I expected it to do!

yordis commented 2 years ago

for an integration without having to reference it and have it included in the bundle.

I see, yeah definitely a problem if you introduce web-workers or things that cant hold the ref to the function.

mitsuhiko commented 2 years ago

Generally speaking I’m struggling to understand why setup needs a hub at all instrad of calling getCurrentHub as needed. Under which circumstances does it make sense to initialize an integration twice with another hub?

kamilogorek commented 2 years ago

Every integration that's configurable. Config is held per integration instance. If we use getCurrentHub, then it's not possible to have for example 2 clients with 2 different RewriteFrames integration.

AbhiPrasad commented 2 years ago

Armin and I had a chat, summarizing some of the takeaways from there.

It's important to note that our integrations have two primary roles. First, they patch something to make sure Sentry reports errors/breadcrumbs/spans as appropriate. For example, we patch the global error handler to get errors. This usually involves using the hub level methods like captureException or withScope. Second, they process events using eventProcessors

First, let's categorize our integration to make it clear what they are doing. Other than the Offline integration (which we are deleting in v7 anyway), each integration very clearly chooses one of these two roles - but they are fairly split between them. Both Yordis and Armin brought up splitting integrations so each are focused - but this would be a much more disruptive breaking change.

	Integration	Patching	EventProcessor
Browser
	Breadcrumbs	✅
	Dedupe		✅
	GlobalHandlers	✅
	LinkedErrors		✅
	TryCatch	✅
	UserAgent		✅
Core
	FunctionToString	✅
	InboundFilters		✅
Node
	Console	✅
	ContextLines		✅
	Http	✅
	LinkedErrors		✅
	Modules		✅
	OnUncaughtException	✅
	OnUnhandledRejection	✅
Integrations
	CaptureConsole	✅
	Debug		✅
	ExtraErrorData		✅
	Offline	✅	✅
	ReportingObserver	✅
	RewriteFrames		✅
	SessionTiming		✅
	Transaction		✅

Some Context on Hubs

The current intention of the getCurrentHub API is that the integrations only patch a single time. In fact, they were meant to live alongside the hub and the client - not in the client as they are today in the JS SDK. They then just forward that information to the correct hub. In languages where the hub can flow properly with the flow of execution (thread locals in Python, async locals in .NET, domains/async hooks in NodeJS, context in Go), this API works fine, the correct hub should be chosen. Unfortunately, this breaks down in Browser JavaScript, where it is impossible for us to associate variables to an async context (without using something expensive byte-size wise like ZoneJS) - which has similar concerns in React Native and Electron.

Some SDKs, like the mobile ones (https://github.com/getsentry/sentry-cocoa/) have elected to only allow for a single hub to exist, but as folks move toward Micro Frontends, it seems we need to support the multiple hubs on a page pattern.

Given the original intention of the getCurrentHub API, and the current limitations of BrowserJS (I mean we could also do a language level async storage proposal if we joined tc39 :P) - we've outlined some things to think about - not in any order, just a brain dump of sorts.

Things to think about

Why do people want multiple hubs?

They want to send events from different parts of their frontend to different Sentry instances. Does this same logic apply to breadcrumbs? What about performance monitoring?

Which hub does a global error go to? We could multiplex to every hub, but does that create a good UX? We could also differentiate between things that happen in global handlers (like only send global stuff to the main Sentry.init hub), but this breaks down for breadcrumbs and spans as it's important that those go to the correct hub (or to all of them!).

Simulating ZoneJS

Given we patch a bunch of async methods in TryCatch, we could try to just make sure that the correct hub is kept in those places. The issue here is that this patch happens in an integration - so things would get tricky that way. Also, we would never patch as much as Zone does, so there would be holes.

Dispatch to every hub.

We can move integrations to live alongside the hub - but can be given to the hub to be created. So every time we add an integration, it gets added to a global list (similar to the global that track global event processors, and global var that stores the main Sentry hub). Then, every time we create a hub, it'll add itself to a global list of hubs.

Then we dispatch to every hub that has been registered, and if the hub says that it was created with this integration, it'll work!

// Creates hub and adds it to global list of hubs
// Add default integrations, makeInt1, and makeInt2 to global list of integrations
// hub keeps track of registered integrations -> default integrations, makeInt1, makeInt2
Sentry.init({ integrations: [makeInt1(), makeInt2()] });

const client = new Client(options);
// add makeInt3 to global list of integrations
// add otherHub to global list of hubs
const otherHub = new Hub(client, { integrations: [makeInt3()] });

setTimeout(() => {
  // will go to every hub that instrumented `setTimeout` using the `TryCatch` integration
  throw new Error("me");
});

Client event middleware

Inspired from conversation I had with Yordis earlier on, we rm -rf the idea of event processors in it's current state, and add what is essentially event processing middleware to the client. Then we can make integrations solely responsible for patching - and say that it only works with a single global hub (the one from Sentry.init). Additional hubs can only add event processors and pass the hub reference explicitly - but not grab stuff from the global integrations. This means we move everything in the client into event processing that just operates on it - all totally customizable.

https://github.com/getsentry/sentry-javascript/blob/58b3ddb4095df3c7798c5a0f699f5e5f49a2e750/packages/core/src/baseclient.ts#L525

Sentry.init({
  eventProcessors: ...,
  integrations: ...,
});

Client {
  _eventProcessors: [],
  _stackTraceProcessors: [],
  _transport:
}

I have more thoughts - but I'll come back to this in a bit.

philipphofmann commented 2 years ago

As pointed out by @AbhiPrasad, on Cocoa, most of our integrations rely on a global hub. They wouldn't work with multiple hubs, nor would the SDK currently work with multiple instances of specific integrations, for example, the ones that detect out-of-memory errors or application not responding. I think we could refactor these so you could have multiple integrations looking for application not responding, but it adds extra overhead. Until now, no customer on Cocoa asked for using multiple hubs, instead, most of them just use the static API, but of course, it could be that somebody will ask for that in the future. So if you want to use multiple hubs on Cocoa, I think instead of having multiple integrations, each integration should call multiple hubs instead.

AbhiPrasad commented 1 year ago

There's an argument we could use something like https://www.npmjs.com/package/@builder.io/partytown to offload logic. Related to https://github.com/getsentry/rfcs/pull/34.

timfish commented 1 year ago

It's worth noting that currently it's tricky to write unit tests for integrations due to the fact that all the required config isn't available until setupOnce is called. Even then, accessing things like the client options is a nightmare to mock:

  public setupOnce(addGlobalEventProcessor: (callback: EventProcessor) => void, getCurrentHub: () => Hub): void {
    const options = getCurrentHub().getClient()?.getOptions(); // mock me!

EDIT

Just noticed this has probably already been brought up: https://github.com/getsentry/sentry-javascript/blob/2c09d9a9a45994018ca934498609222c51d9f325/packages/replay/test/mocks/mockSdk.ts#L77-L78

yordis commented 1 year ago

Maybe the return should be a callback so you call it once you unregister the processor? Like a react useEffect callback

timfish commented 1 year ago

setupIntegrations and therefore setupOnce is called from the client so there's no reason I can think of not to pass the client when configuring integrations.

Ideally I'd like integrations more like this:

interface Hooks {
  beforeSend?: (event: Event) => Event | null;
  beforeBreadcrumb?: (event: Breadcrumb) => Breadcrumb | null;
  // loads more optional hooks 
}

// name + initFn
type Integration = [string, (client: Client) => Hooks]

Then the InboundFilters integration which currently looks like this: https://github.com/getsentry/sentry-javascript/blob/6a275c0b17bcf52e2d585e75d6c6829631107f93/packages/core/src/integrations/inboundfilters.ts#L16-L51

Would simply become:

function inboundFilters(opt: InboundFiltersOptions = {}): Integration {
  return [
    "InboundFilters",
    (client: Client) => {
      const options = _mergeOptions(opt, client.getOptions());

      return {
        beforeSend: (event) => _shouldDropEvent(event, options) ? null : event,
      };
    },
  ];
}

And would be used like:

init({
  dsn: '__DSN__',
  integrations: [
    inboundFilters({ allowUrls: [] })
  ]
});

If we add more and more hooks to the integration API, even more of the SDK functionality could be pulled out into optional integrations.

There might still remain the issue of integration ordering. I've found a few cases recently where ordering matters but that might just be because they're all working mostly through a single hook. I've only found the ordering to matter when an integration is dropping an event.

This could perhaps be improved if the dropping was only possible in a separate hook:

interface Hooks {
  beforeSend?: (event: Event) => Event;
  shouldSend?: (event: Event) => boolean;
}

yordis commented 1 year ago

I am liking what you are going for, just wonder. What is the difference between letting me register the hooks I want?

function inboundFilters(opt: InboundFiltersOptions = {}): Integration {
  return [
    "InboundFilters",
    (client: Client) => {
      const options = _mergeOptions(opt, client.getOptions());
      client.on('beforeSend', (event) => _shouldDropEvent(event, options));
    },
  ];
}

timfish commented 1 year ago

What is the difference between letting me register the hooks I want?

Should the hook 'subscriptions' and emitting be controlled from the client, the hub or some other new abstraction? As soon as you decide on client.on() you can't refactor without a breaking change. The client could use EventEmitter style internally and hookup the functions in the Hook interface without exposing that detail to the integrations.

Having an interface allows you to add properties too. For example it might make more sense to have integration name there so it's accessible when calling the hooks. Or maybe we'd end up needing to add a priority for ordering:

interface Hooks {
  name: string;
  priority: number;
  beforeSend?: (event: Event) => Event | null;
  // loads more optional hooks 
}

It also feels easier to test since you don't need to mock the client or EventEmiter. For example, the below implementation of inboundFilters can be tested without creating a full client and since InboundFiltersClient is a subtype of Client, it still matches the Integration interface:

// The only part of the Client we care about in this integration
interface InboundFiltersClient {
  getOptions(): Options;
}

function inboundFilters(opt: InboundFiltersOptions = {}): Integration {
  return (client: InboundFiltersClient) => {
    const options = _mergeOptions(opt, client.getOptions());

    return {
      name: 'InboundFilters',
      beforeSend: (event) => _shouldDropEvent(event, options) ? null : event,
    };
  },
}

test('test', () => {
  const init = inboundFilters();
  const hooks = init({ getOptions: () => ({ denyUrls: ["file:"] }) });
  const result = hooks.beforeSend?.({ url: "file://test.js" });
  expect(result).to.be.null();
})

yordis commented 1 year ago

I agree with some of the takes. Those are valid points.

My only concern is that, potentially, such an object becomes a swiss army knife and ends up being a bloated object trying to expose and add as many keys as possible to figure out how to hook into the client lifecycle.

Overall, I like it for sure, especially the testing situation. It is always a good sign of good design if it is easy to test.

🚀

getsentry / sentry-javascript