apache / camel-k

Apache Camel K is a lightweight integration platform, born on Kubernetes, with serverless superpowers
https://camel.apache.org/camel-k
Apache License 2.0
867 stars 348 forks source link

KameletBiding : support for partitions for sources #1779

Open lburgazzoli opened 4 years ago

lburgazzoli commented 4 years ago

To improve performances, we should be able to partition a source. We should have support for static partitioning and optionally a dynamic one.

Static partitions

we could add the relevant information in the form of properties to a KameletBinding, something like:

apiVersion: camel.apache.org/v1alpha1
kind: KameletBinding
metadata:
  name: jdbc-source
spec:
  source: 
    ref:
      kind: Kamelet
      apiVersion: camel.apache.org/v1alpha1
      name: jdbc-source
    properties:
      partition-keys: <1>
        - table1
        - table2
        - table3
      partition-property: <2>
        - tables
  sink: 
    ref:
      kind: KafkaTopic
      apiVersion: kafka.strimzi.io/v1beta1
      name: my-topic
  1. list of values used to logically partition the source
  2. the property used by the underlying kamelet to get a reference to the partition keys it has to use

The operator then detect the presence of the special properties partition-keys and partition-property and could create as example a StatefulSet of integrations where each integration receives only a subset of the partition-keys

Dynamic partitions

Instead of hardcoding the values of the partitions, the operator could use a meta service provided by the kamelet to determine how a source can be paritioned

nicolaferraro commented 4 years ago

I think we can keep using the same model that we have now, like:

apiVersion: camel.apache.org/v1alpha1
kind: KameletBinding
metadata:
  name: jdbc-source
spec:
  source: 
    ref:
      kind: Kamelet
      apiVersion: camel.apache.org/v1alpha1
      name: jdbc-source
    properties:
      tables:
      - table1
      - table2
      - table3
  sink: 
    ref:
      kind: KafkaTopic
      apiVersion: kafka.strimzi.io/v1beta1
      name: my-topic

The information that tells the operator that "tables" is a partition key is present on the kamelet, e.g. with a x-camel-partition-key flag in the descriptor.

The assignment of partition and rebalancing can be also done with a modified version of the master component instead of StatefulSets, to make it more dynamic.

Wdyt?

heiko-braun commented 4 years ago

Using a specific partition-* property would allow us to distinguish between partionable and non-partionable sources.

This information might be useful to determine a rollover/redployments strategy for instance.

If the properties remain domain/kamelet specific, rather being generalised, identifying partionable sources might not easily be possible, would it?

lburgazzoli commented 4 years ago

@nicolaferraro I think we should have support StatefulSet and the master one as the StatefulSet is simple to reason about whereas the dynamic one may be a little bit tricky from an operation point of view.

@heiko-braun I think @nicolaferraro proposla to to add metadata to the kamelet is right as it would let any tool to act according (i.e. by leveraging meta services to get a possible list of partitions) so we could add some additional info as labels which would make it easy to search them using standard k8s tool and practice (i.e. kubectl get kameletbindings -l type=partitionable)

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale due to 90 days of inactivity. It will be closed if no further activity occurs within 15 days. If you think that’s incorrect or the issue should never stale, please simply write any comment. Thanks for your contributions!