asyncapi / spec

The AsyncAPI specification allows you to create machine-readable definitions of your asynchronous APIs.
https://www.asyncapi.com
Apache License 2.0
4.24k stars 269 forks source link

[FEATURE REQUEST] Natively support envelopes, e.g., CloudEvents #432

Open fmvilas opened 4 years ago

fmvilas commented 4 years ago

It's a common practice to use message envelopes in event-driven architectures. Most of the time, these envelopes contain information about the company, the domain context, or other application-specific data. Envelopes are also a great way to standardize messages structure across the whole organization, making it easier to build code around it.

The most common case I've found so far —aside from custom formats— is the usage of CloudEvents. Especially, when used together with Kafka and Avro. Trying to define such a system, comes with confusing decisions:

  1. What's should schemaFormat look like? Should it be JSON Schema, Avro, or CloudEvents? Everything could be defined using JSON Schema or Avro, but since the "data" of the envelope is Avro, things get confusing here. Is CloudEvents a schema format at all? I don't think so.
  2. Should we define the message on the AsyncAPI file as if the envelope wasn't there? Or should we include it somehow? If we include it on every message payload, we'll then have to maintain a lot of repeated information across messages, making it harder to update the envelope on every single message.
  3. Should we put this information as a protocol binding? If so, is CloudEvents really a protocol? I don't think so.
  4. Should we just use an extension? Probably yes as a starting point but this case is very frequent and would love to have AsyncAPI recognize it as a first-class citizen.

For these reasons, I think it would be interesting to come up with a first-class solution for envelopes in AsyncAPI. This would make people feel everything is integrated better instead of hacking the spec here and there.

Example

Please, take this example as an illustration of what I'm trying to accomplish here. By no means, it's a final solution.

asyncapi: 2.0.0
channels:
  test:
    message:
      schemaFormat: 'application/vnd.apache.avro;version=1.9.0'
      envelope:
        format: application/cloudevents # This would tell parsers that it's a CloudEvents envelope. Can be optional.
        schema:
          type: object
          properties:
            specversion:
              type: string
              enum: ['1.0', '1.1'] # In case we want to restrict to specific versions of the CloudEvents spec
            myCustomField: # Define custom fields here. Those that are not defined on the CE spec.
              type: string
      payload:
        type: record
        # ... more Avro stuff

Open Questions/Thoughts

  1. What if we could define that CloudEvents (CE) message fields must be placed in the headers or in the payload? CE has its own bindings and, in some of them, they allow you to map the fields to headers or to the payload/body. It would be great to have a way to define that.
  2. Use envelope + bindings. For instance, in the case of CE, we could use a binding called cloudevents that will allow us to define how it's used (not used, binary, structured, both). I don't think we want to define that the envelope fields are going to be placed on the headers section because then they would not be an envelope anymore, right? Food for thought tho.
  3. Should we support in-house envelopes at all? Or should these people be "penalized" for not using a standard format? In any case, we must consider that even CE allows you to define custom fields and their support is a must.
fnobilia commented 4 years ago

That's a very interesting point and a super valid usecase.

In the next community call I wanted to suggest something similar called event metadata. In the past months I found a lot of useful usecase based on generalised metadata.

I am wondering whether the envelope concept is too related to cloudevents or it can be easily generalised. From what I remember about cloudevents, their envelope has a very specific schema.

fmvilas commented 4 years ago

In the next community call I wanted to suggest something similar called event metadata.

I usually recommend putting metadata in headers but agree it's not always possible. For instance, when using WebSockets, there are no headers. Let's discuss it on the next call (I've added it).

I am wondering whether the envelope concept is too related to cloudevents or it can be easily generalised. From what I remember about cloudevents, their envelope has a very specific schema.

In my example, I'm trying to make it generic enough, that's why it has an optional format field and an optional schema field. If your envelope doesn't follow any standards (i.e., it's built in-house) you can leverage schema. That should be enough for custom envelopes, I think.

derberg commented 4 years ago

do we really need a new envelope section? CloudEvents is not a protocol, but the definition of bindings in AsyncAPI is AsyncAPI offers a mechanism called “bindings” that aims to help with more specific information about the protocol and/or the topology so isn't it part fo the topology? and, even if I'm super wrong here, wouldn't it be better to just extend the scope of bindings and have cloudevents binding instead of having envelope that so far would only work for Cloud Events?

fmvilas commented 4 years ago

That would work for CloudEvents but not for custom envelope formats, which are even more common than CloudEvents, especially in the enterprise. Also, I have the feeling that if we start putting too much stuff in the bindings it's gonna become a black box nobody wants to open.

Paul-T-AU commented 4 years ago

So I guess something to consider, as I understand CloudEvents when working with HTTP as an example, the 'Envelope' can be wrapping the message body, which of course fits nicely into the proposal or it could be with the message headers, with unchanged body or it can be a wrapper and also in the headers. That for me at least gets a little confusing if defined as an Envelope block.

Of course not all protocols support these options. So I guess you could place a configuration flag in the envelope specifying where the envelope is to be placed but this is surely protocol dependent.

My 2 cents on how could this be handled,

  1. Use the Envelope section and then with in the binding specific how the CloudEvent is used (not used, binary, structured, both)
  2. Define CloudEvents within the binding as required for that implement of the protocol

IMHO I think [1] might be a good option as the protocol different handling can be restricted to the binding.

fmvilas commented 4 years ago

Let me recap and clarify what this proposal is and isn't about, just to make sure we're all aligned here:

✅ Add a way to specify that a message's payload is wrapped in an envelope. ✅ Make it easy and obvious for people to define their messages in CloudEvents + Avro/JSON Schema/etc. ✅ Make it easy and obvious for people to define any envelope format, including those built in-house that have no specific name or the name is internal to the organization. ✅ Place this information in the spec in a semantic way. To clarify, envelope has a clear meaning while using a binding for an envelope doesn't really make it clear if it's an envelope or it's a protocol (MQTT, AMQP, Kafka, etc.).

❌ Create something specific to CloudEvents. ❌ Create something specific to any format because we might not know the name in advance, so it can't be placed at bindings. The envelope format might not even have a name at all and it's just known inside the organization as "the envelope" or "the wrapper", etc. ❌ Have a way to define envelopes for headers or any other section that isn't payload.


Now, IMO, there are a couple of interesting points that were raised during the last call and on the last comment from @Paul-T-AU:

  1. What if we could define that CloudEvents (CE) message fields must be placed in the headers or in the payload? CE has its own bindings and, in some of them, they allow you to map the fields to headers or to the payload/body. It would be great to have a way to define that.
  2. Use envelope + bindings. For instance, in the case of CE, we could use a binding called cloudevents that will allow us to define how it's used (not used, binary, structured, both). I don't think we want to define that the envelope fields are going to be placed on the headers section because then they would not be an envelope anymore, right? Food for thought tho.
  3. Should we support in-house envelopes at all? Or should these people be "penalized" for not using a standard format? In any case, we must consider that even CE allows you to define custom fields and their support is a must.

I'm adding these questions/thoughts to the original comment. Anything else I should be adding in your opinion?

Paul-T-AU commented 4 years ago

@fmvilas A point of clarification, when we say CloudEvents, is the proposal to be CloudEvents compliant or CloudEvents like/inspired?

fmvilas commented 4 years ago

CloudEvents compliant. The idea is to allow users of CloudEvents reuse their existing messages.

apaezg commented 4 years ago

I am interested in the envelope feature for built in-house case. Our current set up is done over Kafka with json format and it looks to something like this with current capabilities:

asyncapi: 2.0.0
info:
  title: Some application
  version: '0.1.0'
servers:
  kafka-rest-proxy:
    url: http://{kafkadomain}:{port}
    protocol: kafka
    variables:
      kafkadomain:
        default: kafka-rest-proxy
      port:
        default: '8082'
channels:
  item-created:
    publish:
      message:
        $ref: '#/components/messages/MessageWithEnvelopeExample'
components:
  messages:
    MessageWithEnvelopeExample:
      payload:
        type: object
        properties:
          timestamp:
            type: string
          uuid:
            type: string
          payload:
            type: object
            properties:
              pk:
                type: integer
                minimum: 1
                description: Primary key of the related object
              value:
                $ref: '#/components/schemas/Item'
              extra:
                type: object
  schemas:
    Item:
      type: object
      properties:
        amount:
          type: number
          format: float
          description: Some number

Does this match with the intended feature scope?

apaezg commented 4 years ago

Following previous comment, firstly say that we have come with an alternative for not having to repeat envelope in each message by using composition with allOf.

asyncapi: 2.0.0
info:
  title: Some application
  version: '0.1.0'
servers:
  kafka-rest-proxy:
    url: http://{kafkadomain}:{port}
    protocol: kafka
    variables:
      kafkadomain:
        default: kafka-rest-proxy
      port:
        default: '8082'

channels:
  item1-created:
    publish:
      message:
        $ref: '#/components/messages/MessageWithEnvelopeExample1'
  item2-created:
    publish:
      message:
        $ref: '#/components/messages/MessageWithEnvelopeExample2'

components:
  messages:
    MessageWithEnvelopeExample1:
      payload:
        allOf:
          - $ref: '#/components/schemas/Envelope'
          - $ref: '#/components/schemas/PayloadItem1'

    MessageWithEnvelopeExample2:
      payload:
        allOf:
          - $ref: '#/components/schemas/Envelope'
          - $ref: '#/components/schemas/PayloadItem2'

  schemas:
    Envelope:
      type: object
      additionalProperties: false
      properties:
        fired_at:
          type: string
          format: date-time
          description: Represents the moment the message was fired.
        uuid:
          type: string
          description: Uniquely identifies the event and allow for detecting duplicates.

    PayloadItem1:
      type: object
      additionalProperties: false
      properties:
        payload:
          type: object
          properties:
            pk:
              type: integer
              minimum: 1
              description: Primary key of the related object
            value:
              $ref: '#/components/schemas/Item1'
            extra:
              type: object

    PayloadItem2:
      type: object
      additionalProperties: false
      properties:
        payload:
          type: object
          properties:
            pk:
              type: integer
              minimum: 1
              description: Primary key of the related object
            value:
              $ref: '#/components/schemas/Item2'
            extra:
              type: object

    Item1:
      type: object
      properties:
        amount:
          type: number
          format: float
          description: Some number

    Item2:
      type: object
      properties:
        name:
          type: string
          description: Some name

I guess this must have other cons besides than not declaring explicitely that an envelope may have different concerns for the clients, but the result looks quite similar with what would be expected with the proposal.

This said, following the proposal for being CloudEvents friendly, what I would need for being also friendly with my case, is being able to define which one in the key containing "the payload data". In the case of CloudEvents, it is contained under "data", in my case it is under "payload".

So, if I take the proposal to convert it into my example, it could be looking to something like this:

components:
  messages:
    MessageWithEnvelopeExample1:
      envelope:
        $ref: '#/components/schemas/Envelope'
      payload:
        $ref: '#/components/schemas/PayloadItem1'

    MessageWithEnvelopeExample2:
      payload:
        envelope:
          $ref: '#/components/schemas/Envelope'
        payload:
          $ref: '#/components/schemas/PayloadItem1'

  schemas:
    Envelope:
      type: object
      additionalProperties: false
      properties:
        fired_at:
          type: string
          format: date-time
          description: Represents the moment the message was fired.
        uuid:
          type: string
          description: Uniquely identifies the event and allow for detecting duplicates.
      payload-ref: payload  # optional, when not present it will set 'data' as in the case of CloudEvents.

    PayloadItem1:
      type: object
      properties:  # now I wouldn't need to double-nest payload thanks to payload-ref
        pk:
          type: integer
          minimum: 1
          description: Primary key of the related object
        value:
          $ref: '#/components/schemas/Item1'
        extra:
          type: object

    PayloadItem2:
      type: object
      properties: # now I wouldn't need to double-nest payload thanks to payload-ref
        pk:
          type: integer
          minimum: 1
          description: Primary key of the related object
        value:
          $ref: '#/components/schemas/Item2'
        extra:
          type: object

payload-ref (or a better name), would act somehow like discriminator do to define polymorphism.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity :sleeping: It will be closed in 30 days if no further activity occurs. To unstale this issue, add a comment with detailed explanation. Thank you for your contributions :heart:

shkup commented 3 years ago

Hi. I read your related article. You say in there a short definition for CloudEvent: schemaFormat: 'application/cloudevents+json; version=0.2; charset=utf-8'. Can you clarify please this definition? Do I have to create a file named: cloudevents.json and put their the CloudEvent fields at json format? Or is this definition is parsed and magically understandable by the AsyncApi parser?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity :sleeping: It will be closed in 60 days if no further activity occurs. To unstale this issue, add a comment with detailed explanation. Thank you for your contributions :heart:

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity :sleeping: It will be closed in 60 days if no further activity occurs. To unstale this issue, add a comment with detailed explanation. Thank you for your contributions :heart:

camhashemi commented 2 years ago

Any updates on this? seems like a critical feature that an async API schema natively documents the message envelope

fmvilas commented 2 years ago

I'm gonna work on this for v3.0.0, that's all I can say so far. No plans to have it for v2.x.

fmvilas commented 2 years ago

that's all I can say so far

Just for the record, I meant "I haven't really looked at it in detail yet so don't know". Hope it didn't look like I'm hiding information or something 😅

JemDay commented 1 year ago

It would be nice if there was a way to describe the CE metadata/context without having to also understand the way that information is encoded into the payload (structured mode) or transport headers (binary mode). Also note that when using structured JSON (application/cloudevents+json) the business data may appear in either 'data' or 'data_base64'.

Also note that CE's allow for events that only contain metadata, ie no business data.

dret commented 1 year ago

This feels like "HTTP header fields for events", i.e. a way how there's some agreement on syntax and semantics of how events work. It feels very powerful and relevant, but also rather tricky because unlike HTTP, we don't have a shared foundation that's designed for openness and extensibility. I sincerely hope that this will come to fruition at some point!

rogierhofboer commented 1 year ago

I am also stumbling upon this. In case the payload for the message MyMessage is:

{  
  "field1": "value1",
  "field2": "value1"
} 

This envelope can be used:

{  
  "type:": "MyMessage",
  "data" :  {
     "field1": "value1",
     "field2": "value1"
  }
} 

or this envelope:

{  
  "MyMessage": {
     "field1": "value1",
     "field2": "value1"
} 

And these are only 2 examples.

It would be nice not to specify the wrapping as part of the payload, but as a separate concern. In this case there should be some way to reference the type of the message. This could be de messageId. Does anyone have any (new) thoughts on this?

fmvilas commented 1 year ago

I haven't touched this issue in a while because we're working on the v3 release but this is one of my favorites for v3.1. The way I'd do it is similar to what we already have in the spec regarding correlationId. You pass whatever you want as the envelope and specify the path in which the data is contained. For instance, roughly something like this:

messages:
  MyMessage:
    envelopes:
      payload: # <- This is the payload definition (could be extracted from current message payload definition)
        type: object
        properties:
          type:
            type: string
          data:
            type: object
      dataPath: '#/data' # <- This is to indicate where the "data" field is

We can then also have headers inside envelopes. Not perfect but simple enough I think. What do you think?