GoogleCloudPlatform / functions-framework

The Contract for Building New Function Frameworks
Apache License 2.0
120 stars 12 forks source link

Expose HTTP headers in CloudEvent Signature #34

Open grant opened 3 years ago

grant commented 3 years ago

GCF event requests send information in the HTTP headers that we currently don't expose to the user.

Example HTTP headers seen from a GCF V1 Pub/Sub event (here, represented as an object):

{
        "host": "2103cfee64826ae687aca0732ae6f7fd-dot-ucd5d3fce07973027-tp.appspot.com",
        "user-agent": "CloudPubSub-Google",
        "content-length": "392",
        "accept-charset": "UTF-8",
        "content-type": "application/json",
        "forwarded": "for=\"2002:a30:591a:0:b029:f3:1843:9b30\";proto=https",
        "function-execution-id": "bxb97ephkgb0",
        "traceparent": "00-65088630f09e0a5359677a7429456db7-97f23477fb2bf5ec-01",
        "x-appengine-country": "ZZ",
        "x-appengine-default-version-hostname": "ucd5d3fce07973g27-tp.appspot.com",
        "x-appengine-https": "on",
        "x-appengine-request-log-id": "5f50211300ff05432dbed9b52a0001737e75636435643366636530343937333032372d7470000132313033636665653634383236326536383761636130373332616536663766643a3100010102",
        "x-appengine-timeout-ms": "599999",
        "x-appengine-user-ip": "2002:a30:591a:0:b029:f3:1843:9b30",
        "x-cloud-trace-context": "65088637f09e0a5159677a7429456db7/10948871334010811884;o=1",
        "x-forwarded-for": "2002:a30:591a:0:b029:f3:1843:9b30",
        "x-forwarded-proto": "https",
        "x-google-internal-skipadmincheck": "true",
        "connection": "close"
}

V2:

{
  "host": "headers-v2-qqawgke5q-uw.a.run.app",
  "user-agent": "curl/7.64.1",
  "accept": "*/*",
  "x-cloud-trace-context": "2e5erfcc853dedff4c893b310dc0efdf/7061815258238150754;o=1",
  "x-client-data": "CgSL6ZyV",
  "traceparent": "00-2e5e4hcc853dedff4c893b310dc0efdf-62009b8ff0b7e862-01",
  "x-forwarded-for": "67.142.181.51",
  "x-forwarded-proto": "https",
  "forwarded": "for=\"67.142.181.51\";proto=https"
}

Other events besides Pub/Sub likely include some of these headers, or will in V2.

I'd assume that a CloudEvent signature would have something like:

function myFunction(cloudevent) {
  // The HTTP POST standard CloudEvent Headers
  // cloudevent.id
  // cloudevent.source
  // cloudevent.specversion
  // cloudevent.type
  // cloudevent.datacontenttype
  // cloudevent.dataschema
  // cloudevent.subject
  // cloudevent.time
  // cloudevent['x-cloud-trace-context']
  // cloudevent['user-agent']
  // cloudevent...
  // cloudevent.data // The HTTP POST body
}

Example HTTP headers:

{
  "host": "us-central1-project.cloudfunctions.net",
  "user-agent": "curl/7.64.1",
  "accept": "*/*",
  "forwarded": "for=\"67.143.181.51\";proto=https",
  "function-execution-id": "zve62c0a0o98",
  "traceparent": "00-100f0701fg2481493519e03f1d9d07d8-6111dfe1b4f54545-01",
  "x-appengine-city": "sf",
  "x-appengine-citylatlong": "36.851497,-16.171646",
  "x-appengine-country": "US",
  "x-appengine-default-version-hostname": "i89ff4052e6f73771p-tp.appspot.com",
  "x-appengine-https": "on",
  "x-appengine-region": "tx",
  "x-appengine-request-log-id": "60e77ea800ff02c0ebef425g4c0001737e6938396666343035326536633733373731702d7470000133626636363638333063373462396132616639623230643034336330333832393a3200010101",
  "x-appengine-timeout-ms": "599999",
  "x-appengine-user-ip": "67.123.181.51",
  "x-client-data": "CgSL6fsV",
  "x-cloud-trace-context": "103f0701f52481493519e03f1d9d07d8/6994617856779699525;o=1",
  "x-forwarded-for": "67.143.151.51",
  "x-forwarded-proto": "https",
  "connection": "close"
}

I assume we can just expose all of these headers to the function's first parameter. A user can decide which ones they care about. A CloudEvent

There should be no clash of cloudevent as none of these identifiers: "id", "source", "specversion", "type", "datacontenttype", "dataschema", "subject", "time" are HTTP headers.


We can also add this to the Event signature, though not really sure what interface we want.


(This raw HTTP request was captured using a custom Node Functions Framework)

jskeet commented 3 years ago

I assume we can just expose all of these headers to the function's first parameter.

Do you mean just adding them as attributes to the CloudEvent? I'm not sure that that's a good idea, if they're not logically part of the CloudEvent.

Suppose the function wants to pass the CloudEvent on to something else to do the processing - if we've added extra things in that really aren't part of the CloudEvent, the function would need to strip those out first, which seems unfortunate.

I think I'd prefer to have a separate parameter in the event signature, although that ends up being quite long-winded I suspect.

Can we gather specific use cases for this? We could then try to come up with "what would the function author want to do in each case" examples. Note that tracing is particularly interesting as there are two potential trace contexts - the trace context of the event itself (e.g. "a GCS object was finalized while processing record X, which was created due to user activity Y"), and the trace context of event propagation. We should work out what needs to happen with each of those, and how they're propagated.

grant commented 3 years ago

Yeah, I'd probably not want them in the CE parameter as it can look and work not the best (need to try it out).

Things like trace might be relevant in a CE object. There's a documented extension: https://github.com/cloudevents/spec/blob/master/extensions/distributed-tracing.md

See the internal GCF APM PRD.

dazuma commented 3 years ago

Thinking about this, I have to agree that it doesn't make sense to expose all/arbitrary headers, as they are part of the transport, not part of the CloudEvent itself. (We should not be adding fields to a CE that were not set by the sender.) I also don't know why a function should care about headers in the general case. Functions deal with logical events, and the fact that it came via HTTP vs something else should be abstracted away.

That said, for the legacy event conversion, it probably does make sense to augment the conversion algorithm to include some additional information, especially where there's a documented extension such as distributed tracing. If an extension exists, I would assume a use case exists that drove the extension. But if there isn't a well-known extension, I'd still be hesitant. Adding any arbitrary header may be problematic—even if none of the headers clash with standard fields now, doesn't mean they won't in the future.

grayside commented 3 years ago

Beyond collision, it expands the contract of CloudEvents to include more of the transport layer, where it's not unreasonable to imagine the transport layer might evolve at a different pace.

Can we gather specific use cases for this? We could then try to come up with "what would the function author want to do in each case" examples.

+1 to this. In some cases those headers are what CloudEvents are meant to replace through it's own standard (e.g., function-execution-id). In others the headers are a means to an end. Which of those ends apply and should be built into the FF+CloudEvents contract? This is a good meta issue, but potential header values to include should be considered one at a time.

jskeet commented 3 years ago

While doing a v0.3-v1.0 comparison of the CloudEvents spec (fun times, y'all!) I found this interesting paragraph:

Many protocols support the ability for senders to include additional metadata, for example as HTTP headers. While a CloudEvents receiver is not mandated to process and pass them along, it is RECOMMENDED that they do so via some mechanism that makes it clear they are non-CloudEvents metadata.

That would suggest that the various CloudEvents SDKs might want to consider supporting this in the first instance, at which point it should be easy to integrate into the Functions Frameworks.

dazuma commented 3 years ago

@jskeet I did not notice that paragraph. That's definitely suggestive, although I'm not clear what exactly it means. I'll ask in the CloudEvents SDK slack if anyone is doing this and what it looks like.

dazuma commented 3 years ago

So I asked in the CE SDK slack, and people agree that the paragraph in the spec sounds somewhat misleading. The intent was to address extension attributes. It means, a CE process is not required to recognize and propagate all possible extension attributes (but it should if it can). The paragraph is not intended to address arbitrary HTTP headers that have nothing to do with CE or CE extensions (e.g. "User-Agent"). CE is not intended to be an HTTP proxy (especially because it is supposed to be transport-agnostic).

So that said, I stand by my recommendation. We should not be processing arbitrary HTTP headers in the general case. If someone (not a Google service for whom we're explicitly converting legacy events) sends us a CloudEvent, and includes a User-Agent header, that User-Agent is not part of the cloud event, and should not be exposed to the user. Similarly, when converting legacy events (such as from Storage or PubSub), we should not be processing arbitrary HTTP headers. However, if there are specific headers that provide important data (such as trace correlation context), and fit into a transport-agnostic event paradigm (and preferably fit into an existing CE attribute or extension), then I think we can consider updating the legacy event conversion to bring that data in.

govindrai commented 3 years ago

I think what may be lost here is why this issue was opened in the first place.

Adding context:

This issue was opened as a response to this thread: https://github.com/googleapis/nodejs-logging/issues/591

The primary use case is debugging. We want to be able to conveniently look at all logs related to the execution of one event. This can be done with http functions since they expose headers and we can look at the trace and execution id and add them to each log and now all the logs can easily be looked up via the trace and/or execution id.

However with background functions, no trace/execution id is exposed. This makes it so we cannot connect all logs related to these function's executions and makes debugging a lot more painful.

Having a way to easily debug high traffic cloud functions is essential, and this will definitely help a ton.

Personally, I would have loved Cloud Functions to expose the trace/execution Id as part of the environment, regardless of background, http, firebase function.

Here's the difference in logging with access to executionId and not having access (Please note in both instances I've only invoked the function once so I know exactly what logs below to which execution, but functions running in prod are being invoked 10s of times per second and all the logs are jumbled.

HTTP function exposing execution id image

Firebase Function exposing no execution id image

grant commented 3 years ago

OK, thanks for the comment @govindrai.

It seems like we just need to expose these headers:

req.get('function-execution-id');
req.get('x-cloud-trace-context');

The POC exposes them to the user via the context with background functions, like (event, _context_) =>:

context.executionId = executionId;
context.traceId = traceId;
JSteeleIR commented 3 years ago

Hello, any progress on exporting these headers to background functions in the context?

dazuma commented 3 years ago

@jskeet What was the status on the spec for legacy event to cloud event conversion? It seems like we may need to convert and expose these two headers, if present.

jskeet commented 3 years ago

@dazuma: We still have a few things to iron out for some Firebase events. I'm not sure what to do in terms of these headers, but we should work that out.

JamesAtIntegratnIO commented 3 years ago

Hello, really looking for this feature. I'm fixing logging in a cloud function right now and implementing winston logger. I lose the execution_id by doing this and I'm trying to figure out how to get it back.

ace-n commented 2 years ago

Extension attributes to the CloudEvent specification are meant to be additional metadata that needs to be included to help ensure proper routing and processing of the CloudEvent. Additional metadata for other purposes, that is related to the event itself and not needed in the transportation or processing of the CloudEvent, should instead be placed within the proper extensibility points of the event (data) itself.

Based on the paragraph above (sourced from this doc) and Daniel's comment above, it seems like extension attributes are indeed the best approach here.

Moreover, there are already extension attributes for Distributed Tracing. However, I'm not sure if: a) those attributes refer to a specific (ostensibly non-GCP) product b) using those attributes internally would interfere with third-party software