datastax / kaap

KAAP, Kubernetes Autoscaling for Apache Pulsar
https://datastax.github.io/kaap/
Apache License 2.0
45 stars 15 forks source link

Thoughts and feedback #44

Open ddieruf opened 1 year ago

ddieruf commented 1 year ago

Config as json

I know most components have quite a list of possible configurations. I see the "config" option of each component in the API spec is formatted as json. So, my assumption is that all defaults for the given component follow Puslar documentation - this operator is not changing anything. And to override a value, one would follow Pulsar's spec.

If I wanted to turn off non-persistent topic creation

PulsarCluster:
  spec:
    broker:
      config: "{\"enableNonPersistentTopics\":false}"

Or I could change the broker service port (which would break things)

PulsarCluster:
  spec:
    broker:
      config: "{\"enableNonPersistentTopics\":false,\"brokerServicePort\":3333}"

It seems that there are quite a few configuration values that you would want the operator to explicitly control and not give someone the ability to change (like the brokerServicePort). Would it be better to declare all possible config values individually? Then the operator can be more obvious about what it supports and possibly override Pulsar's default values in a documentable way. Also, because there are so many config values, declaring them as one json string is error-prone and very hard to read.

There are many cases in pulsar components where K8s has external provisions (like Service ports and Ingress). Leaving config as a json string will put the operator in a place where it either A)is forced to silently overwrite values without telling anyone or B)hope that the person deploying doesn't decide to provide a bad value for certain things.

Logging

How/where would I set log4j values? Say I wanted to turn up logging on my bookies, where would I provide log4j.appender.CONSOLE.Threshold=DEBUG ?

Values decision tree

I am assuming that the operator assigns values in the following order:

  1. individual component param
  2. global param
  3. pulsar default

Cert issuer

I'd like to offer a different way of thinking about a certificate issuer. Currently, there are provisions for self-signed using cert-manager or passwords to "Override the default secret name from where to load the certificates". But what if I wanted to use a different issuer (enterprise or letsencrypt or cloud-based)? Instead of being so explicit why not accept any Issuer or ClusterIssuer object in the operator? The the stack chart can have provisions for setting up a self-signed issuer and provide the object to the operator. That way the limiting factor is the use of cert-manager but the Issuer and its configuration is left to the person deploying this.

Consolidate TLS values

PulsarCluster.spec. PulsarCluster.spec.global.tls. PulsarCluster.spec.global.tls.certProvisioner.selfSigned.

Could there be some consolidation of values? Maybe each component has PulsarCluster.spec.<component>.tls and then there is a global PulsarCluster.spec.global.tls.

Helm charts

Having a helm chart for the operator and the stack seems redundant. How about removing the operator helm chart? So your options are to create a cluster using the operator resources directly kubectl apply PulsarCluster.yaml or create a cluster (with optional add-ons) with stack chart helm install my-cluster datastax/pulsar-stack -y values.yaml.

Documentation

Descriptions in the CRD are essential. There are so many different tools one can use to manage K8s objects, the CRD is the only common thread to guarantee a good experience. From there one would look to the operator repo which should have the API reference and some simple getting started commands/yaml. OperatorHub and ArtifactHub can point back to the repo. Then in Luna Streaming docs, we can provide deep knowledge about the operator, how it works, its benefits, and many examples/guides.

pgier commented 1 year ago
PulsarCluster:
  spec:
    broker:
      config: "{\"enableNonPersistentTopics\":false}"

Why can't we just use yaml for this?

PulsarCluster:
  spec:
    broker:
      config:
        enableNonPersistentTopics: false

The config should just be a generic map so that we don't need to change the CRDs every time a config option is updated.

ddieruf commented 1 year ago

That's definitely a step in the right direction. I am just worried about allowing bad values. Or unknown overwrites happening.

nicoloboschi commented 1 year ago

Config as json

The config is done via yaml and it's a map. The user has the whole control over all the configuration entries passed to pulsar. You can see an example here: https://github.com/riptano/pulsar-operator/blob/0def79ff66b5998cdc8139dec1f23063e0e57395/helm/examples/tiered-storage/values.yaml#L48-L57 The operator adds some entries based on other parameters (auth, tls..) but they're all overridable via the config section. To me it's fundamental that the operator doesn't add another layer of configuration; this would add a lot of complexity for the user to understand all the steps. Bad values are not checked but this should be handled in Pulsar and not in the operator. However it's a common case where you add a property introduced in a new Pulsar version. It's good to be able to rollback the pulsar version without also reverting the config entry.

Logging

You can set the log level with

broker:
  config:
    PULSAR_LOG_LEVEL: debug
    PULSAR_LOG_ROOT_LEVEL: debug
    PULSAR_EXTRA_OPTS: -Dlog4j.appender.CONSOLE.Threshold=DEBUG

I agree we should have a dedicated config and document it nicely.

Values decision tree

It depends. Some values are both in the global and per-component section and they're overridable/composable.

Cert issuer

Cert manager in the stack operator is for convenience. Cert manager in the pulsar operator is for having a easy way to generate self signed certificates. However the operator is not tightly coupled with it. You can set the secret name to use and ask to not provision any certificate/issuer. Actually you can use an existing cert-manager instance instead of the one provided by the chart.

I agree we shouldn't stick with cert-manager itself. I'm going to add an example with let's encrypt in the next days.

Consolidate TLS values

One detail which is very important (and it might be limiting in some cases) is that the operator is composed by many controllers (one per component). Each controller can read only his own section (e.g. zookeeper, broker) and global. In case of TLS, there are cases where you need to know if tls is enabled on other components. That's why the TLS section is global.

Helm charts

This would reduce the overall complexity for the users. My current opinion is that they could have different lifecycle (versioning) and keeping them separate is a more elegant solution. I think we'll see how it goes in the next months.

Documentation

Sure thing. We will do that once we go public

michaeljmarshall commented 1 year ago

That's definitely a step in the right direction. I am just worried about allowing bad values. Or unknown overwrites happening.

One of the challenges with trying to validate values is that custom plugins in the broker mean that there are valid configuration values that we won't know about in the operator.