Open Leeacarroll opened 1 year ago
The recommendation would be to make N different configs
name=connector-a
topics=a
aep.connection.endpoint.headers=a-headers
name=connector-b
topics=b
aep.connection.endpoint.headers=b-headers
Hi The issue with running multiple connectors is expense when running on thrird party saas offerings such as MSK. Effectively you end up running x number of serverless clusters rather than just 1. MSK also limits the number of compacted partitions on a kafka cluster so that the cluster can only handle <4 connectors.
The above begin to add up to a valid user-case. At the very least a documentation change to explain when to use "topics" (plural) property and when not to would be good.
I'd recommend using ECS over MSK Connect.
Compacted topics have nothing to do with running connectors
Subject of the issue
This is an enhancement. Support for multiple topic ingestion is very limited to the use-case that all topics messages have the same dataset id, operation, dataflow id etc.
Messages should be filtered by topic and streamed into AEPPublisher.producer.post updates based upon topic specific batches. Each topic batch could then have topic specific headers appended. Using something like this in the config:
aep.connection.endpoint.topic-x.headers=... aep.connection.endpoint.topic-y.headers=... aep.connection.endpoint.topic-z.headers=... aep.connection.endpoint.headers=...
the current headers attribute could provide default/common headers with the specific topic headers over writing/adding new ones
I could create a pull request for this if the committers are interested / supportive. The issues I'm concern with are:
Your environment
All
Steps to reproduce
set property topics=a,b,c
where a,b,c are topics with messages which have different aep dataset id's or require different operations or flow id's
Observe stitching, topic update logic and values will be broken within aep
Expected behaviour
NA
Actual behaviour
NA