getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.95k stars 1.57k forks source link

Revamp the Transports API #4660

Closed AbhiPrasad closed 2 years ago

AbhiPrasad commented 2 years ago

As part of https://github.com/getsentry/sentry-javascript/issues/4240#issuecomment-1035323682, we want to revamp the Transports API to make it easier to extend (adding support for attachments), consolidate the options, and reduce bundle size.

This issue forms a living document on how we are going to tackle this - first by setting up an API that can be (possibly) merged in before v7.

Much of this is inspired from the transport design of https://github.com/getsentry/sentry-javascript/blob/v7-dev/packages/transport-base/src/transport.ts

Options

To start, let's look at the current state of the options:

https://github.com/getsentry/sentry-javascript/blob/b1009b5766b34b77781e28cebbf6cbde870e01a2/packages/types/src/transport.ts#L51-L74

Doing a quick runthrough, here's what that looks like:

export interface BaseTransportOptions {
  /** Sentry DSN */
  dsn: DsnLike;
  /** Define custom headers */
  headers?: { [key: string]: string };
  /** Set a HTTP proxy that should be used for outbound requests. */
  httpProxy?: string; // ONLY USED BY NODE SDK
  /** Set a HTTPS proxy that should be used for outbound requests. */
  httpsProxy?: string; // ONLY USED BY NODE SDK
  /** HTTPS proxy certificates path */
  caCerts?: string; // ONLY USED BY NODE SDK
  /** Fetch API init parameters */
  fetchParameters?: { [key: string]: string }; // ONLY USED BY BROWSER SDK
  /** The envelope tunnel to use. */
  tunnel?: string;
  /** Send SDK Client Reports. Enabled by default. */
  sendClientReports?: boolean; // ONLY USED BY BROWSER SDK ATM
  /**
   * Set of metadata about the SDK that can be internally used to enhance envelopes and events,
   * and provide additional data about every request.
   * */
  _metadata?: SdkMetadata;
}

This means we can probably reduce it down to:

export interface BaseTransportOptions {
 // url to send the event
  // transport does not care about dsn specific - client should take care of
  // parsing and figuring that out
  url: string;
  headers?: Record<string, string>;
  bufferSize?: number; // make transport buffer size configurable
};

export interface BrowserTransportOptions extends BaseTransportOptions {
  // options to pass into fetch request
  fetchParams: Record<string, string>;
  sendClientReports?: boolean;
  // TODO: Add custom fetch implementation?
}

export interface NodeTransportOptions extends BaseTransportOptions {
  // Set a HTTP proxy that should be used for outbound requests.
  httpProxy?: string;
  // Set a HTTPS proxy that should be used for outbound requests.
  httpsProxy?: string;
  // HTTPS proxy certificates path
  caCerts?: string;
}

API

The transport does a couple of things in the SDK, but we can think about it mainly turning a SentryRequest into a Response of some kind. Due to rate-limiting, the transport is essentially a closed-loop control system, so the Response must be of some form to guide the transport (and the SDK in general) in making future decisions about how it should function.

SentryRequest<T> -> SentryResponse. SentryRequest becomes generic (defaulting to T = string), so transports can work with buffers/event-emitters/streams as the body if it so pleases. This also leaves us open to have non-http transports (although we'll never probably have that as a default).

Assuming that in v7, the client is in charge of creating the envelope (because it has to be able to dynamically add items).

interface INewTransport {
  // TODO: How to best attach the type?
  send(request: Envelope, type: SentryRequestType): PromiseLike<TransportResponse>;
  flush(timeout: number): PromiseLike<boolean>;
}

We can actually make this entirely functional

function createTransport(options: TransportOptions): INewTransport {
  // some rate limiting logic
  // some transport buffer logic

  // this is a huge breaking change though
  // makes it way harder for users to implement their own transports
  // but is that a case we really care about?
  return { send, flush };
}

Sticking with classes though

/**
 * Heavily based on Kamil's work in
 * https://github.com/getsentry/sentry-javascript/blob/v7-dev/packages/transport-base/src/transport.ts
 */
export abstract class BaseTransport implements INewTransport {
  protected readonly _buffer: PromiseBuffer<TransportResponse>;
  private readonly _rateLimits: Record<string, number> = {};

  public constructor(protected readonly _options: BaseTransportOptions) {
    this._buffer = makePromiseBuffer(this._options.bufferSize || 30);
  }

  /**  */
  public send(envelope: Envelope, type: SentryRequestType): PromiseLike<TransportResponse> {
    const request: TransportRequest = {
      // I'm undecided if the type API should work like this
      // though we are a little stuck with this because of how
      // minimal the envelopes implementation is
      // perhaps there is a way we can expand it?
      type,
      body: serializeEnvelope(envelope),
    };

    if (isRateLimited(this._rateLimits, type)) {
      return rejectedSyncPromise(
        new SentryError(`oh no, disabled until: ${rateLimitDisableUntil(this._rateLimits, type)}`),
      );
    }

    const requestTask = (): PromiseLike<TransportResponse> =>
      this._makeRequest(request).then(({ body, headers, reason, statusCode }): PromiseLike<TransportResponse> => {
        if (headers) {
          updateRateLimits(this._rateLimits, headers);
        }

        // TODO: This is the happy path!
        const status = eventStatusFromHttpCode(statusCode);
        if (status === 'success') {
          return resolvedSyncPromise({ status });
        }

        return rejectedSyncPromise(new SentryError(body || reason || 'Unknown transport error'));
      });

    return this._buffer.add(requestTask);
  }

  /** */
  public flush(timeout?: number): PromiseLike<boolean> {
    return this._buffer.drain(timeout);
  }

  // Each is up to each transport implementation to determine how to make a request -> return an according response
  // `TransportMakeRequestResponse` is different than `TransportResponse` because the client doesn't care about
  // these extra details
  protected abstract _makeRequest(request: TransportRequest): PromiseLike<TransportMakeRequestResponse>;
}

Edit: Have to incorporate client outcomes here - will figure that out in a bit.

AbhiPrasad commented 2 years ago

Here's the functional form (since we only have 1 method that transport will override, that is makeRequest)

function createTransport<O extends BaseTransportOptions>(
  options: O,
  makeRequest: (request: TransportRequest) => PromiseLike<TransportMakeRequestResponse>,
): INewTransport {
  const buffer = makePromiseBuffer(options.bufferSize || 30);
  const rateLimits: Record<string, number> = {};

  const flush = (timeout?: number): PromiseLike<boolean> => buffer.drain(timeout);

  function send(envelope: Envelope, type: SentryRequestType): PromiseLike<TransportResponse> {
    const request: TransportRequest = {
      // I'm undecided if the type API should work like this
      // though we are a little stuck with this because of how
      // minimal the envelopes implementation is
      // perhaps there is a way we can expand it?
      type,
      body: serializeEnvelope(envelope),
    };

    if (isRateLimited(rateLimits, type)) {
      return rejectedSyncPromise(new SentryError(`oh no, disabled until: ${rateLimitDisableUntil(rateLimits, type)}`));
    }

    const requestTask = (): PromiseLike<TransportResponse> =>
      makeRequest(request).then(({ body, headers, reason, statusCode }): PromiseLike<TransportResponse> => {
        if (headers) {
          updateRateLimits(rateLimits, headers);
        }

        // TODO: This is the happy path!
        const status = eventStatusFromHttpCode(statusCode);
        if (status === 'success') {
          return resolvedSyncPromise({ status });
        }

        return rejectedSyncPromise(new SentryError(body || reason || 'Unknown transport error'));
      });

    return buffer.add(requestTask);
  }

  return { send, flush };
}
yordis commented 2 years ago

Please help me here with my thought, I have no context.


Why do you need that flush thing there? If that is required, is that always there?

I remember the SessionFlusher class, and I remember that you didn't need the flush thingy to be part of the Transporter, the Transporter itself was something that will be used by the SessionFlusher but didn't do anything other than "Send this info".

type Transporter = (request: TransportRequest) => PromiseLike<TransportMakeRequestResponse>;

Please help me to understand When and Why the flush exists, I tried to look for in the codebase but I got lost.


Something I like from your Functional implementation is that it is really easy for me to see that you don't actually want people having to mess around with buffering and the Transport itself is an actual function that is a combination with the buffering (among whatever you want it to be).

This means you can reduce the Transporter to be just:

type Transporter = (request: TransportRequest) => PromiseLike<TransportMakeRequestResponse>;

Which you will never communicate directly, and your transporter will never require to deal with buffering.

In the OOP style, you keep entering the realm of abstract things and methods and attributes when you barely have an actual state (closures will do it), so I am biased, besides the extra bytes that you can potentially save.

Composition is easy to adapt, complexity is hard to pull apart.


Otherway to look at this is that buffering, rate-limiting or anything else is a simple wrapper (Closure, middleware) around the transporter (Like Redux if you are familiar with it), for example:

type Transporter = (request: TransportRequest) => PromiseLike<TransportMakeRequestResponse>;

const rateLimiting = (opts: { rate: number }) => (next: Transporter): Transporter {
  const internalState = '.....'
  return ()=> { 
     if (cantSendRightNow) {
       // ...
     }
     return next();
  }
}

const buffer = (opts: { size: number }) => (next: Transporter): Transporter {
  const internalState = '.....'
  return ()=> { 
     if (DoINeedToBuffer) {
       // ...
     }
     return next();
  }
}

const finalProductionVersion = compose(
  buffer,
  rateLimit,
  fetcherTransporter
);

Something around those lines.

P.S: a simple implementation of the idea https://github.com/straw-hat-team/fetcher


What I would do first is to focus on the usage of such Transporter, meaning, finding where we use the Interface as of today, make sure that we did need an interface instead of a simple function, and work backward to the edge.


I am gonna try to be more helpful, please push back and put the ownership of my side to explain myself. I wish I have some standups 😄 programming is hard

kamilogorek commented 2 years ago

Please help me to understand When and Why the flush exists, I tried to look for in the codebase but I got lost.

We decided a long time ago that our capture calls should be fire-and-forget, and users should not be able to await for a singular request. flush however is there, so we can enable users to await on delivery of all requests prior to exiting the process, be it during crash (eg. in node) or before freezing the process (AWS Lambda for example).

It's in majority of cases not used, and is required only for specific needs.

AbhiPrasad commented 2 years ago

Otherway to look at this is that buffering, rate-limiting or anything else is a simple wrapper (Closure, middleware) around the transporter (Like Redux if you are familiar with it), for example:

composition here is a really interesting suggestion, but considering the relatively fixed functionality of the transport, we can probably get away with the sole function constructor.

What I would do first is to focus on the usage of such Transporter, meaning, finding where we use the Interface as of today, make sure that we did need an interface instead of a simple function, and work backward to the edge.

Yup, great point. This is the inspiration for this GH issue in general.

Client Outcomes

We probably need to expand the interface for client outcomes

interface INewTransport {
  // TODO: How to best attach the type?
  // send an Sentry envelope to Sentry
  send(request: Envelope, type: SentryRequestType): PromiseLike<TransportResponse>;
  // flush everything in the transport buffer
  flush(timeout: number): PromiseLike<boolean>;
  // record client outcomes https://develop.sentry.dev/sdk/client-reports/
  record(reason: Outcome, category: SentryRequestType): void
}

Incremental Migration Strategy

There's also been chats about how exactly we migrate this, as we want to minimize the amount of time we are on the major branch.

We can first introduce the new Transport interfaces and have tests around them.

We can then convert the client to send all envelope items to the new transport. This does mean we'll have to accept a temporary bundle size increase. We'll have to flag this somehow if a user provides a custom transport, as that will override everything. Since we use category based rate limits, this should still have rate limiting work fine.

public sendEvent(event: Event): void {
  if (event.type === 'transaction' && this.notCustomTransport()) {
    void this._newTransport.sendEnvelope(eventToEnvelope(event), 'transaction').then(null, reason => console.error(reason));
  } else {
      void this._transport.sendEvent(event).then(null, reason => console.error(reason));
  }
}

^ How does that sound?

There is now an issue of the buffer size though, since having 2 transports running will mean two buffers (double total amount of items). @kamilogorek you think that is fine?

Offline and threaded workers

Newer Node versions introduced https://nodejs.org/api/worker_threads.html. It would be interesting to see how to use the functional transport while it is running on a separate thread.

Similar with a web worker transport?

We need to also make sure the makeRequest works well for offline transports. This might mean that individual transport implementations might need to be able to configure a custom buffer data structure - though it might just mean that if "offline" mode is active, the base transport from createTransport just re-queues up the error.

I'm struggling to think through the best design for an offline system though. Maybe the base transport try catches makeRequest?

yordis commented 2 years ago

After writing a lot, I found some stuff that brought me way too many thoughts, by the moment I got to the end and read more code something lit up.

Leaving some notes for myself, feel free to expand them, my apologies if it bothers you, just leaving a paper trail of thoughts that hopefully are useful at some point for someone.

I am gonna try to code things rather than text, the topic is quite hard to follow especially when I can't speak to you as soon as I see some things and double-check with you, and then get lost in typing, my apologies. Let me see what I can do coding or something around those lines. The conclusion from what I just realized is that Hub is the only stateful, everything about extending how to capture things or process an Event, or doing something with that event at the end is built around. You already have some concepts in place, it is just complex. ```ts type Event = {} type ApiResponse = {} type EventProcessor = (event: Event) => Promise; type Backend = (event: Event) => Promise; type Integration = ()=> Promise; type IntegrationId = string; // stateful object about the installation type Hub = { backends: Backend[]; integration: Map; eventProcessors: Set; }; // only one entry point to deal with Sentry API // everything else communicates with it and a pluggable thing import { flush, init, captureMessage } from '@sentry/core'; // this is the opportunity to swap things and inject things init({ }) await flush(); ```
---- > considering the relatively fixed functionality of the transport, we can probably get away with the sole function constructor. Fair enough, but answering about the offline and threads, the proposed solution would help with such topic, for example: ```ts // notice that the idea is that you get to compose things between different pipelining const finalProductionVersion = compose( buffer, offlineBackup, // keep adding more wrappers rateLimiting, webWorkerTransporter // thread version ); ``` When it comes to web-worker or any kind of threading, step 1 would be to well-define the Messages and remove any serialization problem (it seems that you are already doing that, just double down on it). The previous example will force you to give more importance to such a topic (Messages, Data) because the technicality of what happens in the pipelining isn't that important. All you care about is that eventually will get to some transport that sends it to Sentry Backend (or not could be a web-worker one). So even technical things like buffering and rate-limiting and whatnot aren't that important to focus on at first (obviously you are already there); as long as the Message Passing is set in stone. > There is now an issue of the buffer size though, since having 2 transports running will mean two buffers (double total amount of items). @kamilogorek you think that is fine? Related to this, I am curious to know how the Sentry backend works, a way to deal with this thing is to allow the clients to generate the identity of the record (probably using some UUID) so it would make the backend simpler to deal with idempotency." Otherwise, it is a nightmare most of the time 😨 ----- > so we can enable users to await on delivery of all requests prior to exiting the process, be it during crash (eg. in node) or before freezing the process (AWS Lambda for example). I get the intention, and I think now I understand why I was confused. Help me here with something. When I see interfaces I see swappable components of some kind for some reason. So here, is a question. - How many times does the flushing implementation change based on the life-cycle of the application? - Does the flushing implementation change per environment (testing, prod, dev)? - Does the flushing implementation change per runtime environment? (react native, node, browser)? Here is where my brain is going. `The Hub is something that deals with flushing`, not the transporter or something like that. But it all depends on the previous questions, and be clear with words and naming things since I could be confused by some context. For example, https://github.com/getsentry/sentry-javascript/blob/7f9483175a25ceb340707a0341c9cdb504cc9951/packages/browser/src/sdk.ts#L161 Notice that the client technically never changes, it is always `BrowserClient`. Also notice that there is barely any difference between Browser and Node `sdk` (I am guessing I can call that the semi-Hub) So the concept of flushing doesn't have to be on a `per-client` basis or things like that. So those SDKs wouldn't be about such technicality. This leads me to something else, (help me even more on this one because I had no time to validate my thought 100%) (I AM SORRY FOR KEEP TYPING) Most of `sdk.ts` files per package are almost the same and in some cases just re-exporting from the previous one. Each package files more like a series of metadata and integrations based on the environments. Why you wouldn't make the `Hub` package to be the thing that you initialize and the packages Browser, React, ... Node; don't hide the fact there is a hub thingy. Something around the lines of: ```ts import { makeWebWorkerBackend, makeDefaultBackend } from '@sentry/backends' import * as Hub from '@sentry/hub' import { makeReactIntegration } from '@sentry/react' Hub.init({ integrations: [ makeReactIntegration(), // this could append metadata to the message and things like that as well ], backends: [ // you could limit it to one, the implementation could be around many, makeWebWorkerBackend(), makeDefaultBackend({ host: "https://sentry.io/..." }), makeDefaultBackend({ host: "https://self-hosted.io/..." }) // there you go, just made it up ], }) ``` ---- The more I learn of the codebase the more my brain keeps going and going, my apologies if it bothers you.
yordis commented 2 years ago

@AbhiPrasad is there any opportunity or way that I could hang out with you on Discord or something real-time? Maybe pairing on some coding?

Hopefully could be helpful for you.

AbhiPrasad commented 2 years ago

@yordis I’m on discord - and available to chat anytime after 14:00 CET next week Monday/Tuesday/Wednesday.

yordis commented 2 years ago

well, I am walking up early next week for you. Gonna send some DM on Discord.

kamilogorek commented 2 years ago

// TODO: How to best attach the type?

Why do we even need type if everything is going to be inside an envelope header?

// makes it way harder for users to implement their own transports

Why tho? If we provide all generic functions, it shouldn't be thaaat bad. Also as you pointed out in your functional example, makeRequest is the only thing people should need to provide to create their own transports.

We probably need to expand the interface for client outcomes

I'm not sure about that tbh. It's quite hard to justify the existence of this functionality inside the transport. I never liked it, but we added it because there is another easy way currently. If anything, i'd add a method for sending reports via beacon (eg. report method, not record). I'd much prefer it to be something client-bound, rather than make it live inside the transport. It'd also be possible to tree-shake it if someone decides to not use client reports. (not based on this dynamic boolean in the options, but there are ways to make such features treeshaken, eg. see https://github.com/getsentry/sentry-javascript/blob/v7-dev/packages/browser/src/sdk.ts#L22-L53 for how I made defaultIntegrations tree-shakeable - tl;dr when using raw BrowserClient, BrowserClient wont be used, thus dropped).

^ How does that sound?

As a temporary workaround it should be enough.

There is now an issue of the buffer size though, since having 2 transports running will mean two buffers (double total amount of items). @kamilogorek you think that is fine?

That's fine IMO, transactions are usually not flooding the buffer anyway, and your additional branch will only serve those.

I'm struggling to think through the best design for an offline system though. Maybe the base transport try catches makeRequest?

It should IMO be prior to actual sending call. For node we don't need offline really, as Id not expect servers to have connectivity problem. Its more important to target mobile/browsers. For those, we can rely on browser APIs to detect internet connection. It's not bulletproof, but pinging DNSs before each call is an overkill IMO. And it takes time to rely on request timeouts, especially that they can stay in the buffer for the time they are being resolved.

How many times does the flushing implementation change based on the life-cycle of the application? Does the flushing implementation change per environment (testing, prod, dev)? Does the flushing implementation change per runtime environment? (react native, node, browser)?

No, no and now. Flushing here is nothing more than locking the buffer and calling await on it until it drains :)

The Hub is something that deals with flushing, not the transporter or something like that.

It's only like that because hub is the proxy to the client, which in turn is bound to a given transport instance.

Most of sdk.ts files per package are almost the same and in some cases just re-exporting from the previous one. Each package files more like a series of metadata and integrations based on the environments. Why you wouldn't make the Hub package to be the thing that you initialize and the packages Browser, React, ... Node; don't hide the fact there is a hub thingy. Something around the lines of:

This is effectively what init of each sdk.ts does right now. It adds default integrations/options for the given environment, but other than that, it uses shared code. I do agree though that plenty of code should be extracted even more. Even the flush of all implementation is basically the same, yet we duplicate this code.

AbhiPrasad commented 2 years ago

Why do we even need type if everything is going to be inside an envelope header?

It's because it's typed in the envelope item headers - not the envelope headers in general. This means we have to reach in and grab the envelope item header. I guess that is fine.

I'd much prefer it to be something client-bound

Yeah this makes sense, but I'm gonna not solve it here tbh - let's just do it later.

yordis commented 2 years ago

I'm not sure about that tbh. It's quite hard to justify the existence of this functionality inside the transport. I never liked it

So happy to see my thoughts are aligned with yours @kamilogorek your response was actually the conclusion I was getting at the more I read code and understood things.

timfish commented 2 years ago

It should IMO be prior to actual sending call. For node we don't need offline really, as Id not expect servers to have connectivity problem.

In the Electron SDK, we support offline by wrapping the default transport and persisting failed envelopes to try later: https://github.com/getsentry/sentry-electron/blob/master/src/main/transports/electron-offline-net.ts

Its more important to target mobile/browsers. For those, we can rely on browser APIs to detect internet connection. It's not bulletproof...

It's more common to have navigator.onLine === true and not be able to make a connection than it is to ever see navigator.onLine === false. In my mind , this makes using it to detect connection status next to useless because you've got to cater for the timeouts/failures anyway. All you know for sure is that if it's false there's no point even trying.

If you install Docker, Virtual Box, VMWare or any VPN client navigator.onLine is ALWAYS true, even with aeroplane mode enabled. This is even true for VPNs and Android!

but pinging DNSs before each call is an overkill IMO. And it takes time to rely on request timeouts, especially that they can stay in the buffer for the time they are being resolved.

Yes, no real point waiting for DNS to timeout when you can just wait for the request to Sentry to timeout!

kamilogorek commented 2 years ago

If you install Docker, Virtual Box, VMWare or any VPN client navigator.onLine is ALWAYS true, even with aeroplane mode enabled. This is even true for VPNs and Android!

jeez, TIL 🥲

AbhiPrasad commented 2 years ago

All new transports have been merged in - now just waiting on v7 for us to delete the old ones.