Open charlrvd opened 2 years ago
π Hey @charlrvd, can you please share your values.yaml
(redacting any sensitive information)?
Sure, here it is
cloud: "aws"
ingress:
hostname: posthog.domain
nginx:
enabled: true
letsencrypt: false
annotations:
cert-manager.io/cluster-issuer: letsencrypt
secretName: posthog-tls-cert
cert-manager:
enabled: false
clickhouse:
resources:
limits:
cpu: 400m
memory: 6Gi
requests:
cpu: 400m
memory: 6Gi
persistence:
enabled: true
storageClass: standard
size: 100Gi
kafka:
replicaCount: 3
resources:
requests:
cpu: 100m
memory: 2Gi
limits:
cpu: 250m
memory: 2Gi
livenessProbe:
enabled: true
initialDelaySeconds: 30
timeoutSeconds: 15
failureThreshold: 3
periodSeconds: 30
successThreshold: 1
readinessProbe:
enabled: true
initialDelaySeconds: 15
failureThreshold: 6
timeoutSeconds: 15
periodSeconds: 30
successThreshold: 1
persistence:
storageClass: standard
size: 50Gi
metrics:
kafka:
enabled: true
redis:
architecture: replication
replica:
replicaCount: 3
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 250m
memory: 256Mi
master:
persistence:
storageClass: standard
size: 20Gi
zookeeper:
persistence:
storageClass: standard
size: 20Gi
postgresql:
enabled: false
externalPostgresql:
postgresqlHost: posthog.rds.amazonaws.com
postgresqlPort: 5432
postgresqlDatabase: posthog
postgresqlUsername: admin
existingSecret: postgres-password
existingSecretPasswordKey: password
grafana:
enabled: true
prometheus:
enabled: true
alertmanager:
persistentVolume:
storageClass: standard
size: 10Gi
server:
persistentVolume:
storageClass: standard
size: 100Gi
prometheus-kafka-exporter:
enabled: true
prometheus-postgres-exporter:
enabled: true
config:
datasource:
host: posthog.rds.amazonaws.com
user: admin
database: posthog
passwordSecret:
name: postgres-password
key: password
prometheus-redis-exporter:
enabled: true
Mh, I realized if I run the install on a brand new namespace, with no pre-existing PVC, then the install goes all right.
I think my issue is that I have pre-existing PVC from an install that went nuts after an upgrade.
I upgraded, with the same process as usual, from 26.0.1 to 26.0.9. This upgrade worked but for the first time in 6 months of running posthog, it created a new PV and PVC for clickhouse, so all the old data was missing. I created a PVC manually from the existing PV and set the value clickhouse.persistence.existingClaim
and did the exact same command to upgrade the to version it's already at. But that failed with this error
Set configuration api_version=(2, 5, 0) to skip auto check_version requests on startup
2022-08-01T23:37:58.098465Z [info ] Schema out of sync on some clickhouse nodes! [ee.management.commands.sync_replicated_schema] out_of_sync_hosts={'chi-posthog-posthog-0-0-0': {'person_distinct_id2_mv', 'groups_mv', 'events_dead_letter_queue', 'events_dead_letter_queue_mv', 'sharded_events', 'person', 'sharded_session_recording_events', 'events_json_mv', 'plugin_log_entries', 'groups', 'cohortpeople', 'person_static_cohort', 'session_recording_events_mv', 'plugin_log_entries_mv', 'person_distinct_id_mv', 'person_distinct_id2', 'person_mv'}}
2022-08-01T23:37:58.098828Z [info ] Creating missing tables [ee.management.commands.sync_replicated_schema] missing_tables={'person_distinct_id2_mv', 'groups_mv', 'events_dead_letter_queue', 'events_dead_letter_queue_mv', 'sharded_events', 'person', 'sharded_session_recording_events', 'events_json_mv', 'plugin_log_entries', 'groups', 'cohortpeople', 'person_static_cohort', 'session_recording_events_mv', 'person_mv', 'person_distinct_id_mv', 'person_distinct_id2', 'plugin_log_entries_mv'}
Traceback (most recent call last):
File "manage.py", line 21, in <module>
main()
File "manage.py", line 17, in main
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
output = self.handle(*args, **options)
File "/home/posthog/code/ee/management/commands/sync_replicated_schema.py", line 52, in handle
self.create_missing_tables(out_of_sync_hosts, create_table_queries)
File "/home/posthog/code/ee/management/commands/sync_replicated_schema.py", line 99, in create_missing_tables
query = create_table_queries[table]
KeyError: 'person_distinct_id2_mv'
Closing the Kafka producer with 0 secs timeout.
Proceeding to force close the producer since pending requests could not be completed within timeout 0.
And after that I decided since I can't seem to get it to run on the old data because the migrations are messed up I'll just reuse an empty disk. I ended up deleting the helm release to "start fresh" and left the undelete PVC since I thought they'd be ok. And that's when I got the error in the original post.
Because it actually works in a blank namespace I assume kafka or maybe redis data would be having something to do with migrations and causing my issues.
created a new PV and PVC for clickhouse
Do you happen to have output from the helm command on when this happened, I would be interested to see why it didn't use the existing claim. I don't think there were any changes from 26.0.1 to 26.0.9 regarding PVCs π€
I'll try to set aside some time to reproduce the issue on Monday if that's ok? It's not the first time this has been an issue, and it's pretty a critical issue.
Unfortunately I don't have the outputs anymore.
I am not sure if that would be a Chart issue or maybe a Helm blip. It's the only chart I have that I deploy via helm. The rest is handled by ArgoCD and when it's using helm it always templates | apply so the behavior might be different.
But I had run upgrades on that chart fine more than a dozen times before, the only difference is that I change the --version X.Y.X
so that I don't upgrade while missing a needed version if it's been a while.
Anytime is fine with me. That bug is more informative now. I screwed up by having waited too long to use real backups and I hadn't dig enough in the way the data is stored. So even with my clickhouse disk it seems like I am missing data that must have been in the kafka disks which I cleaned up recently.
Had the same issue while I was deploying to docker swarm. Fixed it by using the right hostname
for zookeeper in clickhouse's config. All the migrations were then applied correctly.
Specifically here -> https://github.com/PostHog/posthog/blob/master/docker/clickhouse/config.xml#L695
Can I get update on this issue ? I'm getting the same error while updating image (upgrading application) Is there any other way than deleting PVC or doing fresh installation ?
can somebody tell me, by which Dockerfile this image is created https://hub.docker.com/r/posthog/posthog-foss ? As we are trying it to be self-hosted. Thanks!
Hello.
I also ran on the same error while deploying this chart with an external clickhouse from official Helm chart .
This is a fresh installation with no data.
Every time the migrate job runs I get the following error:
infi.clickhouse_orm.database.ServerError: Code: 60. DB::Exception: Table posthog.infi_clickhouse_orm_migrations doesn't exist
Any idea how to fix this ?
any update?
Bug description
New installation of the chart fails on migrations. Therefore the deployment is not usable as the plugins, web, worker and events are not passing the init phase.
The SQL migrations run fine but soon after the following error happens
Expected behavior
The installation completes after the migrations all runs to completion
Actual behavior
Migration fails to run on new deployment.
How to reproduce
Deploying the helm chart with
tried both in
upgrade --install
and justinstall
but if fails with the same issues.Environment
kubectl version
): 1.22helm list -n posthog
): 26.0.1 to 26.2.0Additional context