Serverless Function Tracing Needs an Isolated Scope

derekrprice commented 3 years ago

[x] Review the documentation: https://docs.sentry.io/
[x] Search for existing issues: https://github.com/getsentry/sentry-javascript/issues
[x] Use the latest release: https://github.com/getsentry/sentry-javascript/releases
[x] Provide a link to the affected event from your Sentry account: https://sentry.io/organizations/par-tech-inc/performance/tdp-log-processing:248e161ab69049b29769dbbe61ed5da6/?transaction=process_generator_sessions

Package + Version

[x] @sentry/node 6.3.6

Version:

6.3.6

Description

Here is my withSentry wrapper for Azure serverless Function invocations:

const Sentry = require("@sentry/node");
const Tracing = require("@sentry/tracing");
const mongodb = require("mongodb");

const config = require("../../config/sentry");

if (config.dsn) {
    Sentry.init({
        ...config,
        integrations: [
            // enable DB calls tracing
            new Tracing.Integrations.Mongo({ useMongoose: true })
            // new Tracing.Integrations.Mysql()  // No integration available for mysql2.
        ]
    });
}

/**
 * Add Sentry transaction information to outgoing messages.
 * @param {*} transaction
 * @param {*} results
 * @returns
 */
const propagateTransaction = (transaction, results) => {
    if (results && transaction) {
        for (const message of Object.values(results)) {
            Object.assign(message, {
                sentryParentSpanId: transaction.spanId,
                sentryTraceId: transaction.traceId,
                sentrySampled: transaction.sampled
            });
        }
    }
    return results;
};

/**
 * A higher order function which logs error data to Sentry.
 *
 * @param {*} callback          The actual function which might throw errors.
 */
module.exports = callback => async (context, trigger, ...args) => {
    let transaction;
    if (config.dsn) {
        // Define a transaction for performance tracking.
        const transactionContext = {
            op: "serverless-task",
            name: context?.executionContext?.functionName || "test-function"
        };
        if ("sentryParentSpanId" in trigger) {
            Object.assign(transactionContext, {
                parentSpanId: trigger.sentryParentSpanId,
                traceId: trigger.sentryTraceId,
                sampled: trigger.sentrySampled
            });
        }
        transaction = Sentry.startTransaction(transactionContext);

        Sentry.configureScope(scope => {
            scope.setSpan(transaction);
            scope.setContext(
                "executionContext",
                context?.executionContext || {
                    errorMessage:
                        "No executionContext defined.  This could be a notification from a test environment."
                }
            );
            scope.setContext("functionTrigger", trigger);
        });
    }

    try {
        return propagateTransaction(
            transaction,
            await callback(context, trigger, ...args)
        );
    } catch (error) {
        if (config.dsn) {
            Sentry.captureException(error, scope => {
                scope.setContext(
                    "executionContext",
                    context?.executionContext || {
                        errorMessage:
                            "No executionContext defined.  This could be a notification from a test environment."
                    }
                );
                scope.setContext("functionTrigger", trigger);
                scope.setContext(
                    "theFullContext",
                    Object.assign(
                        {},
                        ...Object.keys(context).map(k => ({
                            [k]: JSON.stringify(context[k])
                        }))
                    )
                );
            });
            scope.setSpan(transaction);
            await Sentry.flush();
        }

        throw error;
    } finally {
        await transaction?.finish();
    }
};

Defining an Azure Function like this:

module.exports = withSentry(processLogFile);

This logs everything correctly for exceptions since the scope is set immediately before sending the log, but performance tracing associates DB children with the wrong span and other context. This appears to be because the DB integrations use the span set on the scope. Since we have a high volume application (on the order of 3400 invocations per minute), many function invocations run concurrently and, since Sentry.configureScope() configures the span in a global scope, logged child DB transactions are associated not with the invocation they actually ran during but instead with context of the function invocation that last called Sentry.configureScope(). In these images, there are dozens of MongoDB calls associated with the _sessions function even though the inserts are actually all called by the parent _timeseries function:

derekrprice commented 3 years ago

Presumably, I would want to wrap calls in Sentry.withScope(), but it doesn't look like that can handle an async callback, so that won't work here.

derekrprice commented 3 years ago

Every Azure Function invocation is passed a unique "invocation ID" which you could probably use to index a map of scope stacks but the current Sentry SDK implementation's single global scope stack isn't going to cut it here.

derekrprice commented 3 years ago

Not the most elegant code, but so far I haven't been able to come up with anything better within the limitations of JS. Something like this should work to replace the SDK's existing withScope. I do also need it to be async and return its caller's result because I need to be able to propagate any exceptions thrown inside the callback.

Basically, this maintains the scope map index as a dynamic function name - you would have to replace the SDK's current global stack with a global map, but that shouldn't be complicated. You wouldn't even need to use the function ID I suggested earlier - each call to withScope can simply supply its own retrievable unique key. My key generation is simplistic, but I think it should be sufficient for this use case. This even works when nested:

function getScopeKey() {
    const e = new Error();
    const key = e.stack.match(/(?<=as sentryScopeKeyIs:)(?:\d+)/);
    return key?.[0];
}

let uniqueKey = 1;
async function withScope (callback) {
    const id = uniqueKey++;
    const functionName = "as sentryScopeKeyIs:" + id;
    x[functionName] = function(callback) {
        return callback();
    }

    return x[functionName](callback);
}

// One level.
withScope(getScopeKey).then(scopeKey => console.log(scopeKey));

// Nested scopes.
withScope(async () => {
    return [
        getScopeKey(),
    await withScope(getScopeKey)
    ];
}).then(scopeKey => console.log(scopeKey));

derekrprice commented 3 years ago

This is tested in chrome, btw. Hopefully it works as well with node's chromium engine. Other browsers may need some sort of polyfill.

derekrprice commented 3 years ago

Bump. Thought about this some more. Obviously having withScope become async would be a breaking change, but that isn't the most important part of this fix and I think if I passed withScope an async function, withScope itself wouldn't need to be async as long as it returned the result of the callback since I can await the returned promise then.

Anyhow, the rest of this is a really important change. Any chance of it getting into a release soon?

derekrprice commented 3 years ago

Here is the revised code. Looks like I had left the dummy "x" object in my previous code global too, which is unneeded:

function getScopeKey() {
    const e = new Error();
    const key = e.stack.match(/(?<=as sentryScopeKeyIs:)(?:\d+)/);
    return key?.[0];
}

let uniqueKey = 1;
function withScope (callback) {
    const id = uniqueKey++;
    const functionName = "as sentryScopeKeyIs:" + id;
    const x = {
        [functionName]: function(callback) {
            return callback();
        }
    };

    return x[functionName](callback);
}

async function doApplicationStuff() {
    return getScopeKey();
}

// One level.
withScope(doApplicationStuff).then(scopeKey => console.log(scopeKey));

// Nested scopes.
withScope(async () => {
    return [
        getScopeKey(),
    withScope(doApplicationStuff)
    ];
}).then(scopeKey => console.log(scopeKey));

kamilogorek commented 3 years ago

Have you tried to play around with manually pushing/popping scopes? This is how we do it in the AWSLabda case - https://github.com/getsentry/sentry-javascript/blob/master/packages/serverless/src/awslambda.ts#L267-L287

If it won't cut it, then it'll have to wait until we rework how integrations work, as using getCurrentHub doesn't play nicely with multiple client instances or with async code. This, however, requires a lot of internal refactoring, which will take some time until completed.

derekrprice commented 3 years ago

Pushing and popping scopes does not work with async code. What is wrong with my suggested replacements for withScope and getScope? I believe it would be a non-breaking change.

richardsimko commented 3 years ago

Any particular reason pushing and popping scopes doesn't work for you? I was in a similar situation and was tearing my hair off trying to find a solution to how to keep the Sentry context tied to a request in a web service environment. Passing Koa's context object around is not an option in real applications, so I don't know why the usage of push/pop scopes is so poorly documented when it comes to Sentry tracing.

Anyhow, this solved it for me:

// Mostly code from https://docs.sentry.io/platforms/node/guides/koa/
const transaction = sentry.startTransaction({
    name: `${reqMethod} ${reqUrl}`,
    op: 'http.server'
});
const hub = sentry.getCurrentHub();
try {
    // Push the scope
    const scope = hub.pushScope();
    scope.setSpan(transaction);
    context.res.on('finish', () => {
        transaction.setHttpStatus(context.status);
        transaction.finish();
    });
    await next();
} finally {
    // ... some other cleanup and finishing the transaction
    hub.popScope();
}

derekrprice commented 3 years ago

@richardsimko: Because we have dozens, sometimes hundreds of threads running in parallel (yes, I understand that JS threads are virtual, but the end result is the same). So, every time a process awaits IO, another thread might pick up control. Actual flow ends up looking like:

thread 1: const scope = hub.pushScope();
thread 1: scope.setSpan(transaction);
thread 1: // ... do stuff that might await IO
thread 2: const scope = hub.pushScope();
thread 2: scope.setSpan(transaction);
thread 2: // ... do stuff that might await IO
thread 1: // do stuff that logs to thread 2's scope & transaction
thread 1: // ... do stuff that might await IO
thread 3: // ... and so on...

In practice, we have an Azure Functions app that receives dozens of queued requests in waves, and the waves sometimes come in quick succession. I believe that each listener is configured to receive up to 32 queued messages at a time and about a dozen listeners are running in parallel, listening to different queues. We end up seeing all the transactions from a wave attached to whichever was the last to call hub.pushScope() and register its transaction, and often transactions are even registered to the wrong app, since the individual application listeners are also running in parallel and pulling 32 messages each. I've seen HTTP calls and DB queries from what looks like over 100 discrete function invocations attached to a single function invocation, with the other all invocations registering as not having logged any HTTP or DB requests to Sentry.

richardsimko commented 3 years ago

Since I don't know the details of your implementation I can't say for sure, but to me this sounds very similar to how a Node web server works. It also handles concurrent (Of course concurrent-ish, since Node is single-threaded as you say) requests and while waiting for slow async tasks it processes other things. This is exactly the scenario we're experiencing and from my understanding the scenario push/pop scope is intended to solve when reading the docs:

The default SDK integrations will push and pop scopes intelligently. For instance web framework integrations will create and destroy scopes around your routes or controllers. https://docs.sentry.io/platforms/javascript/enriching-events/scopes/#whats-a-scope-whats-a-hub

(Although I disagree on the "intelligently" part, but that's a different discussion)

derekrprice commented 3 years ago

It is exactly a Node app. Sentry's Node SDK's push and pop is very naïve and does not take into account multiple threads. There is a single shared hub object with a single stack. Push/pop was the first thing I tried. It did not work. Sentry's support and dev teams are aware of the issue but have not issued an ETA on providing a fix. From my discussions with them, they seem to think that it would require a complete rewrite of their scope handlers but, after poring over their code, I disagree. The problem is challenging, but I documented a workaround and a suggested path forward with sample code above.

There is, unfortunately, no way in JS to make a "context" object available to objects down the call stack without actually passing them as arguments to all intervening functions, which makes this a tough problem to solve in JS. My workaround, which encodes a key into a function name on the call stack and then uses a bogus exception to retrieve it from the stack trace, would work, but I haven't had time to make that into a PR for the Sentry SDK yet. I was hoping one of their developers would be inspired by my test code which proves the concept will work, but that hasn't happened yet. I will keep trying to get time to solve this for them here but other pressures are keeping my focus on other projects at the moment.

CesarLanderos commented 3 years ago

@derekrprice I happened to stumble with this, I was having the same issues, I recommend checking the AsyncLocalStorage API from node, I am pretty sure it will help you.

https://nodejs.org/api/async_context.html

andreialecu commented 3 years ago

derekrprice commented 3 years ago

@richardsimko, check out #4071 for a detailed explanation of why you may not have experienced this issue yet with your web app. Basically, the problem is hidden in low traffic scenarios.

github-actions[bot] commented 3 years ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

derekrprice commented 3 years ago

Don't be stale.

mattfysh commented 2 years ago

Could this not also be solved the same way the http handler works? i.e. running the wrapped handler in a new domain?

AbhiPrasad commented 1 year ago

With the 7.48.0 release of the JS SDK we use AsyncLocalStorage instead of domains under the hood, which should help with some of these problems.

If you want to manually wrap methods in an async context to isolate breadcrumbs/scope/spans, you can use Sentry.runWithAsyncContext, which is documented here: https://docs.sentry.io/platforms/node/configuration/async-context/

getsentry / sentry-javascript