Specifying multiple topics in the topics property attribute logically fails in AEP if topic messages need different dataset id's, operations, stream flow id's etc

Leeacarroll commented 1 year ago

Subject of the issue

This is an enhancement. Support for multiple topic ingestion is very limited to the use-case that all topics messages have the same dataset id, operation, dataflow id etc.

Messages should be filtered by topic and streamed into AEPPublisher.producer.post updates based upon topic specific batches. Each topic batch could then have topic specific headers appended. Using something like this in the config:

aep.connection.endpoint.topic-x.headers=... aep.connection.endpoint.topic-y.headers=... aep.connection.endpoint.topic-z.headers=... aep.connection.endpoint.headers=...

the current headers attribute could provide default/common headers with the specific topic headers over writing/adding new ones

I could create a pull request for this if the committers are interested / supportive. The issues I'm concern with are:

how does this change playout in terms of connector performance (each set of sink records provided from kafka will now produce 0 to many http requests to the aep end point.
can we share the same auth token (I think we can...)
how does this impact on the configuration of the kafka micro batching parameters (maybe it doesn't)

Your environment

All

Steps to reproduce

set property topics=a,b,c

where a,b,c are topics with messages which have different aep dataset id's or require different operations or flow id's

Observe stitching, topic update logic and values will be broken within aep

Expected behaviour

NA

Actual behaviour

NA

OneCricketeer commented 1 year ago

The recommendation would be to make N different configs

name=connector-a
topics=a
aep.connection.endpoint.headers=a-headers

name=connector-b
topics=b
aep.connection.endpoint.headers=b-headers

Leeacarroll commented 1 year ago

Hi The issue with running multiple connectors is expense when running on thrird party saas offerings such as MSK. Effectively you end up running x number of serverless clusters rather than just 1. MSK also limits the number of compacted partitions on a kafka cluster so that the cluster can only handle <4 connectors.

The above begin to add up to a valid user-case. At the very least a documentation change to explain when to use "topics" (plural) property and when not to would be good.

OneCricketeer commented 1 year ago

I'd recommend using ECS over MSK Connect.

Compacted topics have nothing to do with running connectors

adobe / experience-platform-streaming-connect