docs: Added transaction to kafka bindings as a first draft

pkonopacki1 commented 3 months ago

Description

Information about transacional topic is important from the consumer perspective as it requires appropriate properties being setup on its side _(isolation.level='readcommitted'). Moreover, property like commit.interval.ms might affect latency on consumer side. This is why I propose adding transaction configuration object to kafka bindings. I'm curious about your opinion on that topic.

dalelane commented 3 months ago

In principle, I agree with the objective here - knowing if a producer is using transactions is important information, and would be good to include somewhere in the spec.

As for the specifics of where to put this, I feel that - as a description of a producer behaviour - it would be better to put this information in the operation binding (rather than the channel binding where you have proposed it). This is just my personal opinion though - I'm interested to see what others think.

chrispatmore commented 3 months ago

Throwing in my 2 cents:

I agree that knowing whether or not a particular channel is transactional, or used in a transactional way is important. From the POV of an application implementing the AsyncAPI I think two things are important to know:

If they should filter out aborted transactions from a particular channel (use read_committed).
If a transaction should be performed between two channels in the document. e.g. the application should receive from channel A, and send to channel B via a transaction.

It looks like the proposal addresses case 1, which is simpler and a good start in this area.

In my personal opinion, this belongs in the receiver operation binding. i.e. the AsyncAPI document says this application should read only committed transactions from the referenced channel. Rather than saying this channel may contain aborted transactions.

Additionally, I wonder if this fits better with how problem 2 may be solved, which I would propose as a new action alongside send / receive. To describe a specific single operation the application should perform using two channels. Interested to hear what others think about this area

pkonopacki1 commented 3 months ago

Throwing in my 2 cents:

I agree that knowing whether or not a particular channel is transactional, or used in a transactional way is important. From the POV of an application implementing the AsyncAPI I think two things are important to know:

If they should filter out aborted transactions from a particular channel (use read_committed).

If a transaction should be performed between two channels in the document. e.g. the application should receive from channel A, and send to channel B via a transaction.

It looks like the proposal addresses case 1, which is simpler and a good start in this area.

In my personal opinion, this belongs in the receiver operation binding. i.e. the AsyncAPI document says this application should read only committed transactions from the referenced channel. Rather than saying this channel may contain aborted transactions.

Additionally, I wonder if this fits better with how problem 2 may be solved, which I would propose as a new action alongside send / receive. To describe a specific single operation the application should perform using two channels. Interested to hear what others think about this area

So, are you proposing that the transacional options should contain field like "read_committed: true"? I'm not sure if this is necessary, because this is the result of using transaction and can be concluded from the fact of using transaction itself.

Regarding mapping more channels within transactions options, I was thinking about that but could not think of a good solution. This would require mapping on API root level, otherwise all channels would have to be mapped several times on each channel itself. And this also does not solve the cross-api transactions, which are possible.

chrispatmore commented 3 months ago

The way I see it, the AysncAPI document exists as a set of instructions for how a client application should behave. and in this case we are dealing with the issue of a channel containing non commited messages (due to a transaction) and that because of this the consumer should be told whether it should read only commited messages or whether it should accept uncommitted ones.

This is a choice for the client that we should enable document writers to be specific about, it may be that they want the client to read non commited messages from this topic for some reason for example.

Additionally I am not sure how we can include transaction.id in the spec, as, as I understand it this must be unique to a client to avoid zombie instances breaking exactly once semantics for the transaction

I guess the way I am thinking about it is this (written as much for my own clarity as anything else):

There are three relevant client operations for this problem:

Client receiving messages from a channel which contains uncommited messages
- Relevant point IMO is should said client receive the uncommited messages or not
Client sending messages to a channel which contains uncommited messages
- I don't think this case matters / is particularly important, could be wrong
Client performs a transaction, receiving from 1..n channels and sending to 1..n channels as part of it
- There is no capacity to describe this at all at the moment in the document
- I consider a transaction to be a single operation (akin to receive / send) as it's intended to be an atomic action
- things relevant to the client here would be:
  - what channels (from what servers (different protocols?)) should the client receive from, and what channels should the client send to
  - how often should it try to commit

pkonopacki1 commented 2 months ago

The way I see it, the AysncAPI document exists as a set of instructions for how a client application should behave. and in this case we are dealing with the issue of a channel containing non commited messages (due to a transaction) and that because of this the consumer should be told whether it should read only commited messages or whether it should accept uncommitted ones.

This is a choice for the client that we should enable document writers to be specific about, it may be that they want the client to read non commited messages from this topic for some reason for example.

Additionally I am not sure how we can include transaction.id in the spec, as, as I understand it this must be unique to a client to avoid zombie instances breaking exactly once semantics for the transaction

I guess the way I am thinking about it is this (written as much for my own clarity as anything else):

There are three relevant client operations for this problem:

Client receiving messages from a channel which contains uncommited messages

Relevant point IMO is should said client receive the uncommited messages or not

Client sending messages to a channel which contains uncommited messages

I don't think this case matters / is particularly important, could be wrong

Client performs a transaction, receiving from 1..n channels and sending to 1..n channels as part of it

There is no capacity to describe this at all at the moment in the document

I consider a transaction to be a single operation (akin to receive / send) as it's intended to be an atomic action

things relevant to the client here would be:

what channels (from what servers (different protocols?)) should the client receive from, and what channels should the client send to

how often should it try to commit

You are right, I though initially that consumer should always read only committed messages but it could be other case also. But than in the end, the binding could be as simple as:

opeartions
  operationName:
    bindings:
      kafka:
        transaction:
          transactional: true
          readCommitted: true

chrispatmore commented 2 months ago

Is the extra field transactional: true implicit in having transaction.readCommitted = true / false or some other field in there being present (if there were other fields)?

pkonopacki1 commented 2 months ago

Hmm, it would be implicitly true therefore maybe this one is not needed.

asyncapi / bindings

docs: Added transaction to kafka bindings as a first draft #257