cloudevents / spec

CloudEvents Specification
https://cloudevents.io
Apache License 2.0
5.05k stars 582 forks source link

Need a better description for schema-url #108

Closed cathyhongzhang closed 6 years ago

cathyhongzhang commented 6 years ago

The description for schema-url is "A link to the schema that the data attribute adheres to". What is meant by "data attribute"?

duglin commented 6 years ago

@cathyhongzhang can you keep your title a bit shorter and use the comment field for the longer description :-) Its kind of awkward to have such long titles for issues.

clemensv commented 6 years ago

The data attribute is this..

The schema-url points to a document that describes what the structure of the content of that property is. What change do you propose?

yaronha commented 6 years ago

@clemensv i thought the schema field can be defining-org/schema-name, think of it like Class/Kind fields in kubernetes (e.g. azure.microsoft.com/blobstore.update-message), not a long/full blown URL which will add overhead to every message

clemensv commented 6 years ago

@yaronha the field is surely optional. Whether it's worth including a "full blown URL" depends on the event type, how long those events hang around, how the event payload is encoded, and how far they travel. If the event payload is encoded with Protobuf or Thrift or any other binary format that isn't self-contained, you literally need a schema copy to feed into your decoder.

Separating a concrete schema hint from the message is certain recipe to having lost track of it when you need to pull the digitally signed event from an audit log in a year.

Schema references are a distinct concern from the scoping given by "namespace". It is absolutely debatable whether having a schema reference expression is universally important enough for inclusion into the first version of this standard (I'd be ok with cutting it), but we shouldn't conflate its purpose with that of the "namespace" disambiguation function.

cathyhongzhang commented 6 years ago

In this scheme definition, "the data attribute" is used. I am confused by the "data attribute" since "metadata attributes" is used in the definition of "context". In the "event" definition, it is stated that "Events include context and data". "contexts" is defined as "A set of consistent metadata attributes included with the event". So I assume that "schema" here is about the structure of the data, not structure of the context. Suggest to change to "A link to the schema that defines the structure of the event data". @clemensv and others, if you are OK with this change, I can create a PR for this.

duglin commented 6 years ago

I think some of our structure is a bit off. The spec says:

I think we just need to pull "data" out as a "Context Attribute" and when we get around to describing the serialization show them as siblings. For example in JSON it might be:

{
  ...context attributes...
  "data": { ... }
}

To your specific question, I think that wording change sounds fine.

cathyhongzhang commented 6 years ago

@duglin That is why I opened the issue #119 Why do we need "data" in the "Context Attributes" section? I can create a PR to pull this "data" out of the "context attributes". But before I do that, can I briefly discuss this with the community in the next meeting? My point is that since Context is metadata about the Event, unless this "data' in the "context attributes" means the location or something else of the event payload in the event message, there is no need to have a "data" field in the "context attributes". If it really means location of the event payload, then I can create a PR to modify the definition of the "data" inside the context attributes.

duglin commented 6 years ago

@cathyhongzhang I suspect that your interpretation aligns with the rest of the group (data not a "context attribute" its more like a sibling to them when its serialized) so I don't see any reason to wait before you submit a PR. I suspect its more of a wording and "moving of text" PR than anything else, so I think it should be accepted pretty easily.

And then based on the comments in the PR, yes if we need to discuss it on the call we can.

inlined commented 6 years ago

I can't find the thread anymore, though this reminds me of our old discussion on whether we should have had context be a peer attribute of data in the JSON encoding. Regardless of what comes across the wire, this is effectively the model I whipped up in my Go library because it's the best way to let developers program against their own custom types. It's pretty standard to write a handler with the shape

func (data interface{} /* dynamically discovered data type */, ctx *EventContext);

but if there's a single EventEnvelope type, then the type of EventContext.data is wiped away without severe effort and replaced with a map[string]interface{} by most naive parsers.

yaronha commented 6 years ago

@inlined see CloudEvents Go interface in nuclio: https://github.com/nuclio/nuclio/blob/master/pkg/processor/cloudevent/types.go

IMO the envelop and the data are part of the Event object, the term "context" in the function signature in Lambda, Azure, nuclio refers to the worker/runtime context (e.g. name/ver of the function, logger, data, ..) and doesn't change per event (with one exception in Lambda having the event ID in "context", talking about AWS inconsistency :) ).

inlined commented 6 years ago

@yaronha What is the Nuclio recommended way for handling these events and drilling into the Event data? Where do developers learn about the structure inside event data? Can they leverage godocs and type safety? In your demo, did you create the S3 URI with:

url := "https://s3.amazonaws.com/" + event.Data["bucket"].(map[string]interface{})["name"].(string) + "/" + event.Data["object"].(map[string]interface{})["key"].(string)`

I'm still working on open-sourcing our demo; it has a brain-dump of early thoughts on a more full-fleshed CE library, but it exposes the types event.Context, event.Handler and event.Mux (the latter two implement http.Handler. Our demo code is strongly typed and reuses existing libraries for S3 and GCS objects (I could not find any existing types for Azure in Go that matched the demo). My code was roughly:

import (
  "google.com/cloudevents-demo/pkg/event"
  s3events "github.com/aws/aws-lambda-go/events"
  gcs "google.golang.org/api/storage/v1"
)

func main() {
  mux := event.NewMux()
  mux.Handle("aws.s3.object.created", func(entity *s3events.Entity, ctx *event.Context) {
    url := fmt.Sprintf("https://s3.amazonaws.com/%s/%s", entity.Bucket.Name, entity.Object.Key)
    doDemo("Amazon S3", url)
  })
  mux.Handle("google.storage.object.create", func(obj *gcs.Object, ctx *event.Context) {
    url := obj.MediaLink
    doDemo("Google Cloud Storage", url)
  })

  http.ListenAndServe(address, mux)
}

This type of reflection-based parsing is very native to Golang but often doesn't work when the event context and data are exposed in the same struct. A developer would have to explicitly set a strongly-typed struct in the Event.data field before passing a sample object to the handler, which then must be copied if the FaaS allows parallel requests, and the handler itself would have to type-assert the data back into its original form.

I realize there are still rough edges with my sample code. E.g. I suspect that event.Context in these handlers should be embedded inside a context.Context. Then the owning HTTP sever could set parent context info like timeouts or other per-request injected traits.

yaronha commented 6 years ago

@inlined can see our KubeCon demo in: https://github.com/nuclio/demos/tree/master/kubecon-eu-18-cloudevents

its Python and using TensorFlow for image classification

for Go Eran posted an implementation in Serverless-WG Slack:

package main

import (
    "encoding/json"

    "github.com/nuclio/nuclio-sdk-go"
)

func EventReturner(context *nuclio.Context, event nuclio.Event) (interface{}, error) {
    return json.Marshal(map[string]interface{} {
        "id": event.GetID(),
        "trigger_class": event.GetTriggerInfo().GetClass(),
        "trigger_kind": event.GetTriggerInfo().GetKind(),
        "content_type": event.GetContentType(),
        "headers": event.GetHeaders(),
        "timestamp": event.GetTimestamp(),
        "path": event.GetPath(),
        "url": event.GetURL(),
        "method": event.GetMethod(),
        "shardID": event.GetShardID(),
        "totalNumShards": event.GetTotalNumShards(),
        "type": event.GetType(),
        "typeVersion": event.GetTypeVersion(),
        "version": event.GetVersion(),
        "body": event.GetBodyObject(),
    })
}

nuclio Event object have few extras (like Shards info in parallel stream/batch processing, HTTP Method/URL, ..)

for the (structured) CE input:

{
  "source": "/subscriptions/319a9601-1ec0-0000-aebc-8fe82724c81e/resourceGroups/testrg/providers/Microsoft.Storage/storageAccounts/myaccount#/blobServices/default/containers/testcontainer/blobs/file1.txt",
  "eventType": "Microsoft.Storage.BlobCreated",
  "eventTime": "2017-08-16T01:57:26.005121Z",
  "eventID": "602a88ef-0001-00e6-1233-1646070610ea",
  "data": {
    "api": "PutBlockList",
    "clientRequestId": "799304a4-bbc5-45b6-9849-ec2c66be800a",
    "requestId": "602a88ef-0001-00e6-1233-164607000000",
    "eTag": "0x8D4E44A24ABE7F1",
    "contentType": "text/plain",
    "contentLength": 447,
    "blobType": "BlockBlob",
    "url": "https://myaccount.blob.core.windows.net/testcontainer/file1.txt",
    "sequencer": "00000000000000EB000000000000C65A"
  },
  "eventTypeVersion": "1",
  "cloudEventsVersion": "0.1"
}

result would be:

{
  "body": {
    "api": "PutBlockList",
    "blobType": "BlockBlob",
    "clientRequestId": "799304a4-bbc5-45b6-9849-ec2c66be800a",
    "contentLength": 447,
    "contentType": "text/plain",
    "eTag": "0x8D4E44A24ABE7F1",
    "requestId": "602a88ef-0001-00e6-1233-164607000000",
    "sequencer": "00000000000000EB000000000000C65A",
    "url": "https://myaccount.blob.core.windows.net/testcontainer/file1.txt"
  },
  "content_type": "",
  "headers": null,
  "id": "602a88ef-0001-00e6-1233-1646070610ea",
  "method": "POST",
  "path": "/",
  "shardID": -1,
  "timestamp": "2017-08-16T01:57:26.005121Z",
  "totalNumShards": 0,
  "trigger_class": "unsupported",
  "trigger_kind": "/subscriptions/319a9601-1ec0-0000-aebc-8fe82724c81e/resourceGroups/testrg/providers/Microsoft.Storage/storageAccounts/myaccount#/blobServices/default/containers/testcontainer/blobs/file1.txt",
  "type": "Microsoft.Storage.BlobCreated",
  "typeVersion": "1",
  "url": "",
  "version": "0.1"
}
inlined commented 6 years ago

I don't think that answered my question. How does a developer interact with the event data/body? Do they type-cast all the way down?

yaronha commented 6 years ago

@inlined can use map or json libs that dont require schema like: https://github.com/buger/jsonparser/blob/master/README.md

inlined commented 6 years ago

I think this will just be one of our stylistic differences. In my own SDKs I will try to push back on stringly typed systems since I find they lead to user error and customer support burden.

yaronha commented 6 years ago

@inlined i wasn't expressing an opinion/style, just state that you have 3 options in typed langs like Go: 1. use type specific structs, 2. use map[string]interface, 3. use such iterators like jsonparser.

i guess the "opinion" also depends on the use-case :)

rspielmann commented 6 years ago

Reading the spec and then stumbling across this issue, I was and still am confused in a slightly different way, but I'm not (yet) willing to open a new issue for it: the wording "the data attribute" occurs to me like it's wrong in the context of JSON. Shouldn't it be called "the data property" or "the value of the data property"? When I read something like "the data attribute", I personally think of HTML, XML etc., but not of JSON.

duglin commented 6 years ago

@cathyhongzhang is this still an issue? If so, do you want to open a PR with some suggested text?

cathyhongzhang commented 6 years ago

After all the changes on data, data attribute, I do not see an issue anymore. Thanks.

duglin commented 6 years ago

ok thanks @cathyhongzhang - closing for you