getsentry / sentry-javascript

Official Sentry SDKs for JavaScript
https://sentry.io
MIT License
7.99k stars 1.57k forks source link

Improve Fetch Stream Handling for Long-lived Streams to Ensure Connection Closure #13996

Open Lei-k opened 4 weeks ago

Lei-k commented 4 weeks ago

Is there an existing issue for this?

How do you use Sentry?

Self-hosted/on-premise

Which SDK are you using?

@sentry/nextjs

SDK Version

8.33.1 ~ develop (build by myself)

Framework Version

Next 14.2.13

Link to Sentry event

No response

Reproduction Example/SDK Setup

sentry.client.config.ts

// This file configures the initialization of Sentry on the client.
// The config you add here will be used whenever a users loads a page in their browser.
// https://docs.sentry.io/platforms/javascript/guides/nextjs/

import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: '__DSN',

  // Add optional integrations for additional features
  integrations: [Sentry.replayIntegration()],

  // Define how likely traces are sampled. Adjust this value in production, or use tracesSampler for greater control.
  tracesSampleRate: 1,

  // Define how likely Replay events are sampled.
  // This sets the sample rate to be 10%. You may want this to be 100% while
  // in development and sample at a lower rate in production
  replaysSessionSampleRate: 0.1,

  // Define how likely Replay events are sampled when an error occurs.
  replaysOnErrorSampleRate: 1.0,

  // Setting this option to true will print useful information to the console while you're setting up Sentry.
  debug: false,
});

sentry.edge.config.ts

// This file configures the initialization of Sentry for edge features (middleware, edge routes, and so on).
// The config you add here will be used whenever one of the edge features is loaded.
// Note that this config is unrelated to the Vercel Edge Runtime and is also required when running locally.
// https://docs.sentry.io/platforms/javascript/guides/nextjs/

import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: '__DSN',

  // Define how likely traces are sampled. Adjust this value in production, or use tracesSampler for greater control.
  tracesSampleRate: 1,

  // Setting this option to true will print useful information to the console while you're setting up Sentry.
  debug: false,
});

sentry.server.config.ts

// This file configures the initialization of Sentry on the server.
// The config you add here will be used whenever the server handles a request.
// https://docs.sentry.io/platforms/javascript/guides/nextjs/

import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: '__DSN',

  // Define how likely traces are sampled. Adjust this value in production, or use tracesSampler for greater control.
  tracesSampleRate: 1,

  // Setting this option to true will print useful information to the console while you're setting up Sentry.
  debug: false,
});

Steps to Reproduce

Hi Sentry Team,

Thank you for adding the trackFetchStreamPerformance option in #13951 and setting it to false by default to resolve #13950. However, I would still like to enable this feature in my project, so I’ve submitted PR #13967 to address the issue.

Problem Summary

In my understanding, there are several scenarios where a fetch connection ends:

  1. The server finishes responding, closing the connection.
  2. An error occurs during the connection, closing it.
  3. The client ends the connection via AbortController and AbortSignal.
  4. All Response ReadableStreams are canceled, prompting the browser to close the connection.

While cases 1-3 can be handled automatically by underlying mechanisms (including cloned Response streams), scenario 4 requires manual handling. As noted in #13950, Sentry’s fetch implementation cannot detect when streams are canceled via cancel(), particularly in libraries like flv.js, leading to long-lived connections that don't close.

Proposed Solution

To resolve this, I overrode the cancel method in the Response body to ensure that connections are properly terminated when the original response is canceled. Below is a modified implementation of resolveResponse:

async function resloveReader(reader: ReadableStreamDefaultReader, onFinishedResolving: () => void): Promise<void> {
  let running = true;
  while (running) {
    try {
      const { done } = await reader.read();
      running = !done;
      if (done) {
        onFinishedResolving();
      }
    } catch (_) {
      running = false;
    }
  }
}

export function resolveResponse(res: Response, parentRes: Response, onFinishedResolving: () => void): void {
  if (!res.body || !parentRes.body) {
    if (res.body) {
      res.body.cancel().catch(_ => {
        // noop on error
      });
    }
    return;
  }

  const body = res.body;
  const parentBody = parentRes.body;
  const responseReader = body.getReader();

  const originalCancel = parentBody.cancel.bind(parentBody) as (reason?: any) => Promise<any>;

  parentBody.cancel = async (reason?: any) => {
    responseReader.cancel('Cancelled by parent stream').catch(_ => {});
    await originalCancel(reason);
  };

  const originalGetReader = parentRes.body.getReader.bind(parentBody) as
    (options: ReadableStreamGetReaderOptions) => ReadableStreamDefaultReader;

  parentBody.getReader = ((opts?: any) => {
    const reader = originalGetReader(opts) as ReadableStreamDefaultReader;
    const originalReaderCancel = reader.cancel.bind(reader) as (reason?: any) => Promise<any>;

    reader.cancel = async (reason?: any) => {
      responseReader.cancel('Cancelled by parent reader').catch(_ => {});
      await originalReaderCancel(reason);
    };

    return reader;
  }) as any;

  resloveReader(responseReader, onFinishedResolving).finally(() => {
    try {
      responseReader.releaseLock();
      body.cancel().catch(() => {});
    } catch (_) {}
  });
}

Replay Integration Issue

While implementing this fix, I also noticed that enabling the replayIntegration option causes the same issue where connections cannot be closed. I traced the problem to the _tryGetResponseText function in packages/replay-internal/src/coreHandlers/util/fetchUtils.ts. This function uses response.text() to fetch response data with a 500ms timeout, but this call fails to detect the use of cancel().

I also referenced the response.text() implementation in Node.js (unfortunately, I was only able to find the Node.js implementation, but I believe the behavior should be similar in the browser environment). In Node.js, response.text() ultimately calls readAllBytes to retrieve the data, as shown below:

async function readAllBytes (reader, successSteps, failureSteps) {
  const bytes = [];
  let byteLength = 0;

  try {
    do {
      const { done, value: chunk } = await reader.read();

      if (done) {
        successSteps(Buffer.concat(bytes, byteLength));
        return;
      }

      if (!isUint8Array(chunk)) {
        failureSteps(TypeError('Received non-Uint8Array chunk'));
        return;
      }

      bytes.push(chunk);
      byteLength += chunk.length;
    } while (true);
  } catch (e) {
    failureSteps(e);
  }
}

This implementation does not account for the 500ms timeout that we set, resulting in readAllBytes never exiting. Therefore, I modified the _tryGetResponseText implementation to manage the Response stream directly, as follows:

async function _tryGetResponseText(response: Response): Promise<string | undefined> {
  if (!response.body) {
    throw new Error('Response has no body');
  }

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let result = '';
  let running = true;

  const timeout = setTimeout(() => {
    if (running) {
      reader.cancel('Timeout while trying to read response body').catch(_ => {});
    }
  }, 500);

  try {
    while (running) {
      const { value, done } = await reader.read();
      running = !done;

      if (value) {
        result += decoder.decode(value, { stream: !done });
      }
    }
  } finally {
    clearTimeout(timeout);
    reader.releaseLock();
  }

  return result;
}

Expected Result

Application can correctly exit connections for long-lived streams

Actual Result

The connection does not exit as expected.

Conclusion

With the modifications above, my application no longer experiences the issue of long-lived connections that cannot be closed. I would appreciate it if you could review this solution and provide any feedback for further improvement. Thank you!

chargome commented 4 weeks ago

Hi @Lei-k thanks for this detailed description and for outlining the issue with replay. We'll check your pr, see if all the tests in our ci pass and then get back to you - let's continue this conversation in the PR 👍