cloudevents / spec

CloudEvents Specification
https://cloudevents.io
Apache License 2.0
5k stars 581 forks source link

Mandatory fields in Avro representation of CloudEvents. #575

Open arvindkasale08 opened 4 years ago

arvindkasale08 commented 4 years ago

Hi Team,

We are analyzing the use of cloudevents-avro format to send Kafka events. The native/core cloud event representation has the id, source, specversion and type as the mandatory fields in a cloud event here Contrary to this the cloud-event avro representation has a generic attributes map and a provision for data here We had the following questions.

  1. What is the specific rationale behind losing out on the strong typing and validations in the core cloud event in the avro representation. Is there a benefit in keeping a attribute map inplace of specific strongly typed id, source, specversion and type fields with other optional metadata as attributes.

  2. How does this map to when working with knative eventing. How does a system with a KafkaSource route the messages based on a cloudevent avro representation.

  3. In knative eventing with a KafkaSource using avro is there a predisposition toward using binary or structured mode?

  4. How would the schema validation for such a cloudevent take place if the data fields are optional?

duglin commented 4 years ago

ping @fabiojose

BigJerBD commented 4 years ago

+1 ,

I also have issues understanding the existence of the attribute field, when the structured cloudevent put it all on the same level.

the example shown puts the data and all the other attribute on the same level. However the avro-cloudevent schema specifies them in two different fields,

fabiojose commented 4 years ago

@arvindkasale08 Lets start a revision of this current schema definition. What do you think?

fabiojose commented 4 years ago

May something like that, but maybe necessary to take into account the historical steps to reach the current schema.

Instead of generic attribute, we may have something like this for required attributes:

 {
      "name": "id",
      "type": "string"
}

And this, for the optional one:

 {
      "name": "dataschema",
      "type": ["null", "string"]
}
BigJerBD commented 4 years ago

Our organisation which is currently designing a custom avro-cloudevent, with extra fields based on our requirements, currently prefer having required attributes in that fashion.

we also put 1.0 as the default spec version :

{"name": "specversion",
 "type": "string",
 "default": "1.0",}

the use of builtin logicalTypes is also considered. Avro support uuid and timestamp-millis for those for the id and time fields. I wonder if it could be something to be appropriate for the official cloudevent avro-format

logical types (at the bottom) : https://avro.apache.org/docs/current/spec.html

fabiojose commented 4 years ago

Good, lets work on that @arvindkasale08.

fabiojose commented 4 years ago

What do you think about to perform a PR to propose changes based on your experience?

duglin commented 4 years ago

I don't know Avro, so I don't know if this applies or not, but one of the reasons we didn't have well-defined CE attributes for some protocol is for forward compatibility. Take as an example, the case where a protocol might want the message to look like this:

- spec-defined-attr1
- spec-defined-attr2
- extensions
   - mycoolextension1
   - mycoolextension2

If, at some point in the future, mycoolextension1 was promoted to be a first-class attribute in the spec, then the schema of the message would change. The syntax of the message would change - meaning the extension would move from in the "extensions" bucket out to be a top-level attribute. Breaking all existing receivers who don't upgrade. And receivers, who knew about this change, would then need to look in both places if they wanted to support old and new versions of the spec. This is one of the reasons HTTP header extensions dropped the "X-" prefix.

We wanted to be able to add new top-level attributes as optional attributes w/o introducing a major version bump and not break existing implementations that might be using those extensions.

For reasons like this, we wanted ALL CE attributes (spec defined and extensions) to be treated the same way. See: https://github.com/cloudevents/spec/blob/master/primer.md#cloudevent-attribute-extensions for more info.

As I said, I don't know if this applies to Avro or not, but I wanted to mention it

fabiojose commented 4 years ago

That's it, I did not remember that!

I am working with Avro and will provide examples as soon as possible.

Stay in touch @arvindkasale08

BigJerBD commented 4 years ago

Sorry for sharing maybe to much of my opinion, when I am just a simple user of both Cloud event and Avro, but i feel its important to mention.

It is possible to support Forward/Backward compatibility on fields that are not mapping with a proper uses of defaults.

Quote from this link

To maintain compatibility, you may only add or remove a field that has a default value. (The field favoriteNumber in our Avro schema has a default value of null.) For example, say you add a field with a default value, so this new field exists in the new schema but not the old one. When a reader using the new schema reads a record written with the old schema, the default value is filled in for the missing field.

If you were to add a field that has no default value, new readers wouldn't be able to read data written by old writers, so you would break backward compatibility. If you were to remove a field that has no default value, old readers wouldn't be able to read data written by new writers, so you would break forward compatibility.

Suppose there is a need to add a new optional field . putting a "default":"null" will be good enough since all previous avro-cloudevents will have this value infered on deserialisation.

On the flipside, new mandatory fields without a default are of course breaking changes that will causes compatibility issues, but it should be the cases because they are mandatory. Storing all fields in a map seems to be an antipattern since it does not use the strong main features of avro.

I would be happy to open a PR for an alternative, maybe more viable schema , if it is not a problem :slightly_smiling_face:

yuce commented 3 years ago

+1

Avro without a schema is not very useful. As others mentioned in this ticket, using a map for attributes impairs type safety and also decreases encoding/decoding performance and increases output size. The following event encoded using CE Avro 1.0 schema takes 216 bytes, while using the alternative schema which has fields for each required and optional attributes at: https://github.com/scaleplandev/spce-java/blob/master/etc/cloudevents_spce_spec.avsc takes 144 bytes:

{
  "datacontenttype": "application/json",
  "subject": "foobar",
  "specversion": "1.0",
  "source": "/example/source",
  "id": "1234",
  "time": "2020-10-05T13:20:25.223Z",
  "type": "example.type",
  "dataschema": "https://temperature.com/schema",
  "data": "{\"temperature\": 38}"
}

Using the alternative schema improves the encoding/decoding speed dramatically, benchmark at: https://github.com/yuce/cloudevents-java-benchmarks

As far as I see the only reason for keeping the generic schema is future compatibility. How about attaching a version number to the content type similar to what GitHub does: https://developer.github.com/v3/media/#request-specific-version e.g., application/cloudevents.v2+avro ?

duglin commented 1 year ago

@yuce @fabiojose @BigJerBD where are we on this one? Is there someone willing to create a PR to address the concern?

duglin commented 11 months ago

@yuce @fabiojose @BigJerBD anyone want to follow-up on this one? I'm going to suggest we close it. We can re-open it if someone wants to continue with it.

duglin commented 11 months ago

@arvindkasale08 does https://github.com/cloudevents/spec/blob/main/cloudevents/working-drafts/avro-compact-format.md address your concerns?