GoogleCloudPlatform / functions-framework-nodejs

FaaS (Function as a service) framework for writing portable Node.js functions
Apache License 2.0
1.29k stars 160 forks source link

Stackdriver Trace Integration #57

Open grant opened 5 years ago

grant commented 5 years ago

Stackdriver Trace Integration

Add Stackdriver Trace support to the Functions Framework.

Library: https://github.com/googleapis/cloud-trace-nodejs Google Issue: https://issuetracker.google.com/issues/124761993

@stew-r Can you add details as to what this would do? Would this be a tutorial or part of the actual framework?

Example Code

"@google-cloud/debug-agent": "^4.0.1",
"@google-cloud/profiler": "^2.0.2",
"@google-cloud/trace-agent": "^4.1.0",
import * as debugAgent from '@google-cloud/debug-agent';
import * as profiler from '@google-cloud/profiler';
import * as trace from '@google-cloud/trace-agent';

let toolsAreSetup = false;

/**
 * Starts Google Cloud tooling.
 * - StackDriver Trace
 * - StackDriver Profiler
 * - StackDriver Debugger
 */
export function startTooling() {
  // Ensure tools are only setup once per process.
  if (toolsAreSetup) {
    return null;
  } else {
    toolsAreSetup = true;
  }

  // Start Trace
  trace.start();

  // Start profiler
  profiler
    .start({
      serviceContext: {
        service: getServiceName(),
        version: 'v1',
      },
    })
    .catch(err => {
      console.error('Failed to start profiler', err);
    });

  // Start debugger
  const debug = debugAgent.start({
    serviceContext: {
      service: getServiceName(),
      version: 'v1',
      // version: functions.config().tsc.version
    },
  });

  // Return all tools
  return {
    trace,
    profiler,
    debug,
  };
}

/**
 * Gets the name of the function for logging.
 * @returns {string} The name of the function with a '_function' suffix.
 */
function getServiceName(): string {
  const serviceName = process.env.FUNCTION_TARGET || 'unknown_cloud';
  return `${serviceName}_function`;
}
stew-r commented 5 years ago

Ideally, this would be part of the Functions Framework and a user of the framework gets trace support built in automatically. The user does not need to think about importing a client or doing any setup. They can disable this feature by passing some configuration.

grant commented 5 years ago

I added a WIP PR for this feature.


Edit: It looks like there's a codelab for getting this setup: https://codelabs.developers.google.com/codelabs/cloud-function-logs-traces/index.html?index=..%2F..index#0

Elyx0 commented 4 years ago

Struggling aswell here from https://github.com/googleapis/nodejs-logging/issues/591 any updates?

grant commented 4 years ago

So far there has been no priority for this feature, no upvotes. The issue is that require('@google-cloud/trace-agent').start(); must be the first module loaded.

I don't think it would be hard to implement that, but we need someone to help implement a good PR.

Elyx0 commented 4 years ago

Can I add this myself @grant, does it need to be the first module in the index.src exporting the cloud functions or required inside every functions?

dinigo commented 4 years ago

As I understad it, this feature was available at some point, but was removed afterwards, and now it's being re-implemented in the FF?

What kind of help do you need in the PR? I may be able to savage some time from here and there. We (Divisadero) rely on Cloud Functions and want more visibility of what's happening within executions

grant commented 4 years ago

I'm not an expert in Stackdriver Trace so I'm not sure of the requirements for this.

We need a proposed implementation for optional trace support that would not affect the performance of all functions that don't enable it. Then we need a PR that follows that proposal that we can get approved by the eng team.

I think GCF might already have trace support, I'm not sure.

manwithsteelnerves commented 4 years ago

Any update on this? Can someone suggest how can I assure require('@google-cloud/trace-agent').start(); will be loaded at start?

Currently by default the trace dashboard showing some details but not sure if I add a trace agent, I can debug my major problem. Currently I have delays of around more than 10-40 seconds for each function, which is really bothering. And this won't happen every time.

Can someone give more details on this agent?

  1. Request started - Internal log from system
  2. Start - Point where my express app code starts
  3. End - Point where the function ends

I have no clue why its taking so much time to start executing my code :|

RequestTrace

manwithsteelnerves commented 4 years ago

Can someone give a hint on how to consider the stack trace logging in the above image?

grant commented 4 years ago

I haven't looked at this since. I would assume the steps would be:

sk- commented 4 years ago

Hi @grant, setting the environment variable does not work.

However, I have partially succeeded on enabling opentelemetry for both background and http functions.

Note that we only get HTTP spans, and we don't get specific FASS ones (see https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/faas.md).

In order to achieve that, one needs to inject a middleware before finishing the creation of the express app. This is similar to what I did to extract the trace and execution id (see https://github.com/googleapis/nodejs-logging/issues/591#issuecomment-641562263).

It'd be great if you could add the executionId and traceId to the context, before calling the background function.

Also, if you could add a hook to inject a custom middleware without having to fiddle with the inner implementation of the framework and express, that would make our code much more reliable.

My idea is to write either a functions-framework plugin for opentelemetry or an express middleware adding the trace.

I think a plugin is not possible as we cannot control when the script is being included, specially if you are using Firebase Functions. On the other hand, the middleware requires a reliable way of injecting the plugin.

grant commented 3 years ago

Looking at internal worker, here is the older Node versions (non FF) feature support. (I've verified this code does not contain internal workings.)

I think we simply need to include something like this in the FF at the top of the invoker.

let tracer;
if (process.env.GOOGLE_CLOUD_TRACE_ENABLED &&
    process.env.GOOGLE_CLOUD_TRACE_ENABLED.toLowerCase() === 'true') {
  // If Trace API is not enabled in user projet, below will still succeed but
  // traces will not be registered.
  // Override root span name as it uses request.path for requests to an
  // 'express' server.Also force traces flush after each function execution by
  // setting buffer size to 0.
  tracer = require('@google-cloud/trace-agent').start({
    // As the worker only services one request at a time, this enables an
    // optimization that reduces overhead introduced by context tracking across
    // asynchronous boundaries.
    clsMechanism: 'singular',
    serviceContext: {service: FUNCTION_NAME, version: FUNCTION_VERSION},
    rootSpanNameOverride: FUNCTION_NAME,
    bufferSize: 0,
    stackTraceLimit: -1,
    contextHeaderBehavior: 'require',
    samplingRate: 0,
  });
}

let debug;
if (process.env.GOOGLE_CLOUD_DEBUG_ENABLED &&
    process.env.GOOGLE_CLOUD_DEBUG_ENABLED.toLowerCase() === 'true') {
  // If Debugger API is not enabled in user projet, below will still succeed but
  // debugee will not be registered.
  debug = require('@google-cloud/debug-agent').start({
    serviceContext: {service: FUNCTION_NAME, version: FUNCTION_VERSION},
  });
}

if (process.env.GOOGLE_CLOUD_PROFILER_ENABLED &&
    process.env.GOOGLE_CLOUD_PROFILER_ENABLED.toLowerCase() === 'true') {
  // If Profiler API is not enabled in user projet, below will still succeed but
  // Profiler data will not be registered.
  require('@google-cloud/profiler').start({
    serviceContext: {service: FUNCTION_NAME, version: FUNCTION_VERSION},
  });
}

This piece of code must be injected before any other code is executed (simple variables/functions are OK).

There's also some other code required.

// When invoking a function...
 setTraceHeader(res);
 // execute users function

/**
 * Adjusts res object before it's ready to be passed back.
 * @param {!express.Response} res
 */
var setTraceHeader = function(res) {
  try {
    if (tracer != null && tracer.isRealSpan(tracer.getCurrentRootSpan())) {
      res.set(GOOGLE_CLOUD_SPAN_SAMPLED_HEADER_FIELD, '1');
    } else {
      res.set(GOOGLE_CLOUD_SPAN_SAMPLED_HEADER_FIELD, '0');
    }
  } catch (ex) {
    console.error('Error when setting trace header ' + ex.toString());
  }
};

Deps:

    "@google-cloud/trace-agent" : "3.3.1",
    "@google-cloud/debug-agent" : "3.0.0",
    "@google-cloud/profiler" : "0.2.2"

(Internal cs/worker/nodejs/worker.js)

kintel commented 3 years ago

@grant I think your suggestion will work, but keep in mind that @google-cloud/trace-agent is will be deprecated in favour of OpenTelemetry sometime the next 2-6 months.

grant commented 3 years ago

Well, it looks like we should add this then? https://github.com/open-telemetry/opentelemetry-js-contrib/blob/master/examples/express/tracer.js

The instructions/API for using this is very confusing imo.

sk- commented 3 years ago

@grant Below is the config helper we are using in our codebase, with some comments to clarify what each config means. Note that it's not necessary to explicitly add the plugins, as those will be autodetected. Note also that this code is using OpenTelemetry 0.10, as there are some breaking changes in the last version and the google pacakges have not been updated yet.

const opentelemetry = require('@opentelemetry/api');
const {
  TraceExporter,
} = require('@google-cloud/opentelemetry-cloud-trace-exporter');
const { NodeTracerProvider } = require('@opentelemetry/node');
const { BatchSpanProcessor } = require('@opentelemetry/tracing');
const {
  CompositePropagator,
  HttpCorrelationContext,
  HttpTraceContext,
  ProbabilitySampler,
} = require('@opentelemetry/core');
const {
  CloudPropagator,
} = require('@google-cloud/opentelemetry-cloud-trace-propagator');

exports.initTracing = () => {
  const provider = new NodeTracerProvider({
    // This is important to keep the cost reasonable
    sampler: new ProbabilitySampler(0.01),
  });
  provider.register({
    // Use CloudPropagator + defaults
    propagator: new CompositePropagator({
      propagators: [
        new HttpCorrelationContext(),
        new HttpTraceContext(),
        // The first 2 propagators are the default ones, and the CloudPropagator is the one used to extract Google defined headers.
        new CloudPropagator(),
      ],
    }),
  });

  // This is the exporter used to send the traces to Google Cloud
  const exporter = new TraceExporter();

  provider.addSpanProcessor(new BatchSpanProcessor(exporter));
  opentelemetry.trace.setGlobalTracerProvider(provider);
};
aabmass commented 3 years ago

Note that it's not necessary to explicitly add the plugins, as those will be autodetected

@sk how are the plugins autodetected? I don't see anything about that in the OTel JS docs. The individual plugins would still have to be installed as well, but you're saying you don't need the plugins config?

    plugins: {
      express: {
        enabled: true,
        path: '@opentelemetry/plugin-express',
      },
      http: {
        enabled: true,
        path: '@opentelemetry/plugin-http',
      },
    }
sk- commented 3 years ago

@aabmass the Getting Started guide says:

OpenTelemetry can collect tracing data automatically using plugins. Vendors/Users can also create and use their own. Currently, OpenTelemetry supports automatic tracing for:

sk- commented 3 years ago

Any progress on this?

One issue we have been experiencing is that some spans are sent after the functions ends, which trigger an error, because resources are not available after the function ends.

So we would need a way for firesbase-functions/function-framework to allow us to register a completed hook and in that hook call forceFlush. See https://github.com/open-telemetry/opentelemetry-js/pull/1296