PostHog / charts-clickhouse

Helm chart for deploying PostHog with ClickHouse on your K8s infrastructure
MIT License
57 stars 74 forks source link

external redis setup failed #359

Closed visla-xugeng closed 2 years ago

visla-xugeng commented 2 years ago

Bug description

I tried to setup the external redis based on the instruction:

redis:
  enabled: false

externalRedis:
  host: "posthog.cache.us-east-1.amazonaws.com"
  port: 6379
  password: xxxxxx

Expected behavior

I expected the posthog will use this external redis

Actual behavior

But it still tried to connect to internal redis

How to reproduce

Environment

I deployed it through helm chart in my aws eks environment.

Additional context

I checked the template and found this file, _snippet-redis-env.tpl which does not cover external redis section. It only has internal redis setting. This tpl is called by several other tpl files, like, events-deployment.yaml, plugins-deployment.yaml. As a result, all these pods will not connect to the external redis, even I disabled the internal redis.

How can I handle this situation?

visla-xugeng commented 2 years ago

below is the part capture of describing event pod (From the output, it still points to internal redis)


Environment:
      POSTHOG_REDIS_HOST:        posthog-posthog-redis-master
      POSTHOG_REDIS_PORT:        6379
guidoiaquinti commented 2 years ago

👋 Hi @visla-xugeng. I'm sorry to hear about the issue you are experiencing.

Which version of the Helm chart are you using? Support for an external Redis is available since version 12.0.0.

I checked the template and found this file, _snippet-redis-env.tpl which does not cover external redis section. It only has internal redis setting. This tpl is called by several other tpl files, like, events-deployment.yaml, plugins-deployment.yaml. As a result, all these pods will not connect to the external redis, even I disabled the internal redis.

This is true but slightly incorrect at the same time. _snippet-redis-env.tpl is the same for all deployments but because we handle the internal VS external Redis logic directly in the main helper file: https://github.com/PostHog/charts-clickhouse/blob/d168b0f53f7540f00471dbc23c4fec8c0af2ca53/charts/posthog/templates/_helpers.tpl#L66-L151

I've tried to reproduce the issue you are describing without success:

  1. custom values file

    redis:
    enabled: false
    
    externalRedis:
    host: "posthog.cache.us-east-1.amazonaws.com"
    port: 6379
    password: xxxxxx
  2. rendered the templates with helm template --output-dir '/tmp/posthog' --set "cloud=private" -f ./test.yaml ./charts/posthog --debug

  3. I didn't find a single entry called redis-master, they are all rendered as value: posthog.cache.us-east-1.amazonaws.com.

Can you please double check the values.yaml file you are using? Happy if you can share it with us so that we can 🔎 this further.

Thank you! I hope you have a lovely start of the week 🙇

visla-xugeng commented 2 years ago

@guidoiaquinti Thank you for your explanation. After double check, I think the issue is not on your side. Let me introduce a little bit background of this case to you. My coworker forked your repo and build his own chart with action to deploy the posthog in our k8s. We have not figured out why the newly builded chart could not connect to the external redis cluster.

The reason why we have to fork your repo is that your official helm chart does not work as expected. The pods cannot be initialized, when we use the official helm-chart. Screenshot2022_05_02_212353

From the screenshot, you can see that most of pods are in Init state. After checking the log of clickhouse-operator-8cff468-8wsdt,

E0503 03:27:34.906936       1 worker.go:1195] deleteCHI():posthog/posthog:unable to get CRD, got error: customresourcedefinitions.apiextensions.k8s.io "clickhouseinstallations.clickhouse.altinity.com" is forbidden: User "system:serviceaccount:posthog:clickhouse-operator" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope

My coworker manually updated templates/clickhouse-operator/clusterrole.yaml after forking the repo. Do you know this permission issue? How to fix this one instead of forking the repo?

guidoiaquinti commented 2 years ago

@guidoiaquinti Thank you for your explanation. After double check, I think the issue is not on your side. Let me introduce a little bit background of this case to you. My coworker forked your repo and build his own chart with action to deploy the posthog in our k8s. We have not figured out why the newly builded chart could not connect to the external redis cluster.

The reason why we have to fork your repo is that your official helm chart does not work as expected. The pods cannot be initialized, when we use the official helm-chart. Screenshot2022_05_02_212353

From the screenshot, you can see that most of pods are in Init state. After checking the log of clickhouse-operator-8cff468-8wsdt,

E0503 03:27:34.906936       1 worker.go:1195] deleteCHI():posthog/posthog:unable to get CRD, got error: customresourcedefinitions.apiextensions.k8s.io "clickhouseinstallations.clickhouse.altinity.com" is forbidden: User "system:serviceaccount:posthog:clickhouse-operator" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope

My coworker manually updated templates/clickhouse-operator/clusterrole.yaml after forking the repo. Do you know this permission issue? How to fix this one instead of forking the repo?

I'm glad we've sorted out Redis config as not an issue.

Regarding the CDR issue you are experiencing, can you please share some more info about your setup?

I'm asking because the error you posted looks like an RBAC issue to me (clickhouse-operator can't fetch CRDs) and it is specific to your cluster implementation (like the Redis one you've experienced above).

visla-xugeng commented 2 years ago

@guidoiaquinti Based on my coworker's comments, his just updated this file, charts/posthog/templates/clickhouse-operator/clusterrole.yaml.

- apiGroups:
    - apiextensions.k8s.io
  resources:
    - customresourcedefinitions
  verbs:
    - get
    - list

2: we are using aws as provider

cloud: "aws"

3: We do not use any other RBAC rules. Let's skip the forked repo first and just focus on your official repo. I install this official chart in EKS on AWS by terraform. I double check the my values.yaml of chart, most of them are the default. I only build my own ingress (ALB), use an external postgresql db and external redis cluster. Not sure why so many pods are not able to be running up. From the logs of clickhouse-operator, it shows the error below:

E0504 01:05:25.733706       1 worker.go:1195] deleteCHI():posthog/posthog:unable to get CRD, got error: customresourcedefinitions.apiextensions.k8s.io "clickhouseinstallations.clickhouse.altinity.com" is forbidden: User "system:serviceaccount:posthog:clickhouse-operator" cannot get resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope

(This is also the reason why my coworker has to fork the chart and to modify the clusterrole. )

During the installation, I did not touch any CRDs.

4: One more question about cert-manager. In one of my testing environment, I have cert-manager installed before posthog. I got these errors below when I install posthog. What's the purpose of cert-manager here? Can posthog share the same cert-manager with other applications?


│ Error: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "certificates.cert-manager.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-name" must equal "posthog": current value is "cert-manager"; annotation validation error: key "meta.helm.sh/release-namespace" must equal "posthog": current value is "cert-manager"
│
posthog-contributions-bot[bot] commented 2 years ago

This issue has 2685 words at 6 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:

  1. Write some code and submit a pull request! Code wins arguments
  2. Have a sync meeting to reach a conclusion
  3. Create a Request for Comments and submit a PR with it to the meta repo or product internal repo

Is this issue intended to be sprawling? Consider adding label epic or sprint to indicate this.

visla-xugeng commented 2 years ago

@guidoiaquinti Do you think we can have a Zoom meeting to discuss this one?

guidoiaquinti commented 2 years ago

Based on my coworker's comments, his just updated this file, charts/posthog/templates/clickhouse-operator/clusterrole.yaml

The resource you are updating is coming from https://github.com/Altinity/clickhouse-operator and it shouldn't be necessary to change it

Not sure why so many pods are not able to be running up

Can you please look at the pod logs? I bet they are all waiting for CH to be up before starting.

I install this official chart in EKS on AWS by terraform.

  1. Can you please send me the Helm command you are using?
  2. Can you please send me the output of kubectl get crd ?

One more question about cert-manager. In one of my testing environment, I have cert-manager installed before posthog. I got these errors below when I install posthog. What's the purpose of cert-manager here? Can posthog share the same cert-manager with other applications?

We currently support PostHog installations only on brand new clusters. Please do not try to install PostHog on clusters with resources already provisioned as those might conflict with the installation (and it will be more difficult for us to help you troubleshooting).

Do you think we can have a Zoom meeting to discuss this one?

Take a look at our support options at: https://posthog.com/support

visla-xugeng commented 2 years ago

@guidoiaquinti

  • Can you please send me the Helm command you are using? I am using helm_release to install the helm chart
    
    resource "helm_release" "posthog" {
    repository = "https://posthog.github.io/charts-clickhouse/"
    name       = "posthog"
    chart      = "posthog"
    version    = "18.3.1"

namespace = kubernetes_namespace.posthog.metadata.name timeout = 1200

values = compact([ local.values_posthog ])

set { name = "externalPostgresql.postgresqlHost" value = var.external_postgresql_endpoint type = "string" }

set_sensitive { name = "externalPostgresql.postgresqlPassword" value = var.posthog_service_password type = "string" }

set { name = "externalPostgresql.postgresqlUsername" value = "posthog_service" type = "string" } }


> * Can you please send me the output of `kubectl get crd` ?

kubectl get crds NAME CREATED AT certificaterequests.cert-manager.io 2022-05-04T16:30:50Z certificates.cert-manager.io 2022-05-04T16:30:51Z challenges.acme.cert-manager.io 2022-05-04T16:30:51Z clickhouseinstallations.clickhouse.altinity.com 2022-05-03T01:55:49Z clickhouseinstallationtemplates.clickhouse.altinity.com 2022-05-03T01:55:50Z clickhouseoperatorconfigurations.clickhouse.altinity.com 2022-05-03T01:55:51Z clusterissuers.cert-manager.io 2022-05-04T16:30:54Z clustertriggerauthentications.keda.sh 2022-04-06T20:32:44Z eniconfigs.crd.k8s.amazonaws.com 2022-04-06T18:47:38Z externalsecrets.kubernetes-client.io 2022-04-06T20:32:35Z horizontalrunnerautoscalers.actions.summerwind.dev 2022-04-22T23:48:09Z ingressclassparams.elbv2.k8s.aws 2022-04-06T20:32:32Z issuers.cert-manager.io 2022-05-04T16:30:56Z orders.acme.cert-manager.io 2022-05-04T16:30:57Z runnerdeployments.actions.summerwind.dev 2022-04-22T23:48:13Z runnerreplicasets.actions.summerwind.dev 2022-04-22T23:48:16Z runners.actions.summerwind.dev 2022-04-22T23:48:18Z runnersets.actions.summerwind.dev 2022-04-22T23:48:20Z scaledjobs.keda.sh 2022-04-06T20:32:44Z scaledobjects.keda.sh 2022-04-06T20:32:44Z securitygrouppolicies.vpcresources.k8s.aws 2022-04-06T18:47:40Z targetgroupbindings.elbv2.k8s.aws 2022-04-06T20:32:32Z triggerauthentications.keda.sh 2022-04-06T20:32:44Z

guidoiaquinti commented 2 years ago

Can you please send me the Helm command you are using?

That doesn't look like a Helm command to me but a Terraform DSL? Can you send us the underlying command generated by the code?

Your CDRs looks good to me but you are definitely running PostHog on a pretty custom cluster with other resources unrelated to PostHog. We unfortunately do not support this kind of deployment as of today due to what we've discussed already above.

Thank you!