PostHog / posthog

🦔 PostHog provides open-source product analytics, session recording, feature flagging and A/B testing that you can self-host.
https://posthog.com
Other
20.71k stars 1.24k forks source link

Read-only tables on zookeeper #12312

Open jandolezal71 opened 1 year ago

jandolezal71 commented 1 year ago

Bug description

Hi, my self hosted posthog instance is struggling with read-only tables on start. Instance worked like a charm for 1 month and after nowhere server will not bootup, because zookeeper will throw errors with read-only tables. I have everything on my dedicated server. Restarts of everything did not helped.

2022.10.17 09:09:11.338159 [ 390 ] {} <Error> posthog.person (bb28534e-3cc3-4e68-9af6-e40094527a19): void DB::StorageReplicatedMergeTree::mutationsFinalizingTask(): Code: 242. DB::Exception: Table is in readonly mode (zookeeper path: /clickhouse/tables/noshard/posthog.person). (TABLE_IS_READ_ONLY), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xb371dfa in /usr/bin/clickhouse
1. DB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xb3de7e4 in /usr/bin/clickhouse
2. DB::StorageReplicatedMergeTree::assertNotReadonly() const @ 0x15fde319 in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::mutationsFinalizingTask() @ 0x1604fac9 in /usr/bin/clickhouse
4. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x14f0452e in /usr/bin/clickhouse
5. DB::BackgroundSchedulePool::threadFunction() @ 0x14f06ea7 in /usr/bin/clickhouse
6. ? @ 0x14f07eb0 in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xb419257 in /usr/bin/clickhouse
8. ? @ 0xb41cddd in /usr/bin/clickhouse
9. ? @ 0x7fd946173609 in ?
10. __clone @ 0x7fd946098133 in ?
 (version 22.3.7.28 (official build))
2022.10.17 09:09:11.338331 [ 473 ] {} <Error> posthog.sharded_session_recording_events (c338e4ca-d39a-4299-860f-476c37b27660): void DB::StorageReplicatedMergeTree::mutationsFinalizingTask(): Code: 242. DB::Exception: Table is in readonly mode (zookeeper path: /clickhouse/tables/01/posthog.session_recording_events). (TABLE_IS_READ_ONLY), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xb371dfa in /usr/bin/clickhouse
1. DB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0xb3de7e4 in /usr/bin/clickhouse
2. DB::StorageReplicatedMergeTree::assertNotReadonly() const @ 0x15fde319 in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::mutationsFinalizingTask() @ 0x1604fac9 in /usr/bin/clickhouse
4. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x14f0452e in /usr/bin/clickhouse
5. DB::BackgroundSchedulePool::threadFunction() @ 0x14f06ea7 in /usr/bin/clickhouse
6. ? @ 0x14f07eb0 in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xb419257 in /usr/bin/clickhouse
8. ? @ 0xb41cddd in /usr/bin/clickhouse
9. ? @ 0x7fd946173609 in ?
10. __clone @ 0x7fd946098133 in ?
 (version 22.3.7.28 (official build))
2022.10.17 09:09:11.897294 [ 391 ] {} <Error> StorageKafka (kafka_person_distinct_id): Consumer error: Broker: Unknown topic or partition
2022.10.17 09:09:11.897426 [ 391 ] {} <Error> StorageKafka (kafka_person_distinct_id): There were 1 messages with an error
2022.10.17 09:09:11.897459 [ 391 ] {} <Error> StorageKafka (kafka_person_distinct_id): Only errors left

Environment

Any ideas what i can do?

guidoiaquinti commented 1 year ago

👋 Hi! Can you please share more info about your setup? Are you running K8s or the hobby deploy? Can you please share the Zookeeper logs?

jandolezal71 commented 1 year ago

Hi Guido! Thank for reply.

I wanted to run whole stack on my single machine without k8s, so yes, its extracted from hobby project. Here are zookeper logs.

ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2022-10-14 12:39:42,503 [myid:] - INFO  [main:QuorumPeerConfig@174] - Reading configuration from: /conf/zoo.cfg
2022-10-14 12:39:42,507 [myid:] - INFO  [main:QuorumPeerConfig@435] - clientPort is not set
2022-10-14 12:39:42,507 [myid:] - INFO  [main:QuorumPeerConfig@448] - secureClientPort is not set
2022-10-14 12:39:42,507 [myid:] - INFO  [main:QuorumPeerConfig@464] - observerMasterPort is not set
2022-10-14 12:39:42,508 [myid:] - INFO  [main:QuorumPeerConfig@481] - metricsProvider.className is org.apache.zookeeper.metrics.impl.DefaultMetricsProvider
2022-10-14 12:39:42,514 [myid:] - ERROR [main:QuorumPeerConfig@706] - Invalid configuration, only one server specified (ignoring)
2022-10-14 12:39:42,517 [myid:1] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2022-10-14 12:39:42,517 [myid:1] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2022-10-14 12:39:42,517 [myid:1] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2022-10-14 12:39:42,518 [myid:1] - WARN  [main:QuorumPeerMain@139] - Either no config or no quorum defined in config, running in standalone mode
2022-10-14 12:39:42,521 [myid:1] - INFO  [main:ManagedUtil@44] - Log4j 1.2 jmx support found and enabled.
2022-10-14 12:39:42,528 [myid:1] - INFO  [main:QuorumPeerConfig@174] - Reading configuration from: /conf/zoo.cfg
2022-10-14 12:39:42,529 [myid:1] - INFO  [main:QuorumPeerConfig@435] - clientPort is not set
2022-10-14 12:39:42,529 [myid:1] - INFO  [main:QuorumPeerConfig@448] - secureClientPort is not set
2022-10-14 12:39:42,529 [myid:1] - INFO  [main:QuorumPeerConfig@464] - observerMasterPort is not set
2022-10-14 12:39:42,529 [myid:1] - INFO  [main:QuorumPeerConfig@481] - metricsProvider.className is org.apache.zookeeper.metrics.impl.DefaultMetricsProvider
2022-10-14 12:39:42,531 [myid:1] - ERROR [main:QuorumPeerConfig@706] - Invalid configuration, only one server specified (ignoring)
2022-10-14 12:39:42,531 [myid:1] - INFO  [main:ZooKeeperServerMain@123] - Starting server
2022-10-14 12:39:42,542 [myid:1] - INFO  [main:ServerMetrics@62] - ServerMetrics initialized with provider org.apache.zookeeper.metrics.impl.DefaultMetricsProvider@2357d90a
2022-10-14 12:39:42,545 [myid:1] - INFO  [main:DigestAuthenticationProvider@47] - ACL digest algorithm is: SHA1
2022-10-14 12:39:42,545 [myid:1] - INFO  [main:DigestAuthenticationProvider@61] - zookeeper.DigestAuthenticationProvider.enabled = true
2022-10-14 12:39:42,548 [myid:1] - INFO  [main:FileTxnSnapLog@124] - zookeeper.snapshot.trust.empty : false
2022-10-14 12:39:42,559 [myid:1] - INFO  [main:ZookeeperBanner@42] -
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -   ______                  _
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -  |___  /                 | |
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -     / /    ___     ___   | | __   ___    ___   _ __     ___   _ __
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -    / /    / _ \   / _ \  | |/ /  / _ \  / _ \ | '_ \   / _ \ | '__|
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -   / /__  | (_) | | (_) | |   <  |  __/ |  __/ | |_) | |  __/ | |
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -  /_____|  \___/   \___/  |_|\_\  \___|  \___| | .__/   \___| |_|
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -                                               | |
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -                                               |_|
2022-10-14 12:39:42,560 [myid:1] - INFO  [main:ZookeeperBanner@42] -
2022-10-14 12:39:42,561 [myid:1] - INFO  [main:Environment@98] - Server environment:zookeeper.version=3.7.0-e3704b390a6697bfdf4b0bef79e3da7a4f6bac4b, built on 2021-03-17 09:46 UTC
2022-10-14 12:39:42,561 [myid:1] - INFO  [main:Environment@98] - Server environment:host.name=e99a0813966c
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:java.version=11.0.12
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:java.vendor=Oracle Corporation
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:java.home=/usr/local/openjdk-11
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:java.class.path=/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.7.0-bin/bin/../build/classes:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.7.0-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-prometheus-metrics-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-jute-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/snappy-java-1.1.7.7.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/slf4j-log4j12-1.7.30.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/slf4j-api-1.7.30.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_servlet-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_hotspot-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_common-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-native-unix-common-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-native-epoll-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-resolver-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-handler-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-common-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-codec-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-buffer-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/metrics-core-4.1.12.1.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jline-2.14.6.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-util-ajax-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-util-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-servlet-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-server-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-security-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-io-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-http-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-databind-2.10.5.1.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-core-2.10.5.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-annotations-2.10.5.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/commons-cli-1.4.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/audience-annotations-0.12.0.jar:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/conf:
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:java.io.tmpdir=/tmp
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:java.compiler=<NA>
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:os.name=Linux
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:os.arch=amd64
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:os.version=5.10.0-13-amd64
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:user.name=zookeeper
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:user.home=/home/zookeeper
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:user.dir=/apache-zookeeper-3.7.0-bin
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:os.memory.free=984MB
2022-10-14 12:39:42,562 [myid:1] - INFO  [main:Environment@98] - Server environment:os.memory.max=1000MB
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:Environment@98] - Server environment:os.memory.total=1000MB
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:ZooKeeperServer@138] - zookeeper.enableEagerACLCheck = false
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:ZooKeeperServer@151] - zookeeper.digest.enabled = true
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:ZooKeeperServer@155] - zookeeper.closeSessionTxn.enabled = true
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:ZooKeeperServer@1499] - zookeeper.flushDelay=0
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:ZooKeeperServer@1508] - zookeeper.maxWriteQueuePollTime=0
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:ZooKeeperServer@1517] - zookeeper.maxBatchSize=1000
2022-10-14 12:39:42,563 [myid:1] - INFO  [main:ZooKeeperServer@260] - zookeeper.intBufferStartingSizeBytes = 1024
2022-10-14 12:39:42,564 [myid:1] - INFO  [main:BlueThrottle@141] - Weighed connection throttling is disabled
2022-10-14 12:39:42,565 [myid:1] - INFO  [main:ZooKeeperServer@1300] - minSessionTimeout set to 4000
2022-10-14 12:39:42,565 [myid:1] - INFO  [main:ZooKeeperServer@1309] - maxSessionTimeout set to 40000
2022-10-14 12:39:42,566 [myid:1] - INFO  [main:ResponseCache@45] - getData response cache size is initialized with value 400.
2022-10-14 12:39:42,566 [myid:1] - INFO  [main:ResponseCache@45] - getChildren response cache size is initialized with value 400.
2022-10-14 12:39:42,567 [myid:1] - INFO  [main:RequestPathMetricsCollector@109] - zookeeper.pathStats.slotCapacity = 60
2022-10-14 12:39:42,568 [myid:1] - INFO  [main:RequestPathMetricsCollector@110] - zookeeper.pathStats.slotDuration = 15
2022-10-14 12:39:42,568 [myid:1] - INFO  [main:RequestPathMetricsCollector@111] - zookeeper.pathStats.maxDepth = 6
2022-10-14 12:39:42,568 [myid:1] - INFO  [main:RequestPathMetricsCollector@112] - zookeeper.pathStats.initialDelay = 5
2022-10-14 12:39:42,568 [myid:1] - INFO  [main:RequestPathMetricsCollector@113] - zookeeper.pathStats.delay = 5
2022-10-14 12:39:42,568 [myid:1] - INFO  [main:RequestPathMetricsCollector@114] - zookeeper.pathStats.enabled = false
2022-10-14 12:39:42,570 [myid:1] - INFO  [main:ZooKeeperServer@1536] - The max bytes for all large requests are set to 104857600
2022-10-14 12:39:42,570 [myid:1] - INFO  [main:ZooKeeperServer@1550] - The large request threshold is set to -1
2022-10-14 12:39:42,571 [myid:1] - INFO  [main:AuthenticationHelper@66] - zookeeper.enforce.auth.enabled = false
2022-10-14 12:39:42,571 [myid:1] - INFO  [main:AuthenticationHelper@67] - zookeeper.enforce.auth.schemes = []
2022-10-14 12:39:42,571 [myid:1] - INFO  [main:ZooKeeperServer@361] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 clientPortListenBacklog -1 datadir /datalog/version-2 snapdir /data/version-2
2022-10-14 12:39:42,600 [myid:1] - INFO  [main:Log@169] - Logging initialized @496ms to org.eclipse.jetty.util.log.Slf4jLog
2022-10-14 12:39:42,671 [myid:1] - WARN  [main:ContextHandler@1660] - o.e.j.s.ServletContextHandler@292b08d6{/,null,STOPPED} contextPath ends with /*
2022-10-14 12:39:42,672 [myid:1] - WARN  [main:ContextHandler@1671] - Empty contextPath
2022-10-14 12:39:42,689 [myid:1] - INFO  [main:Server@375] - jetty-9.4.38.v20210224; built: 2021-02-24T20:25:07.675Z; git: 288f3cc74549e8a913bf363250b0744f2695b8e6; jvm 11.0.12+7
2022-10-14 12:39:42,719 [myid:1] - INFO  [main:DefaultSessionIdManager@334] - DefaultSessionIdManager workerName=node0
2022-10-14 12:39:42,719 [myid:1] - INFO  [main:DefaultSessionIdManager@339] - No SessionScavenger set, using defaults
2022-10-14 12:39:42,721 [myid:1] - INFO  [main:HouseKeeper@132] - node0 Scavenging every 660000ms
2022-10-14 12:39:42,724 [myid:1] - WARN  [main:ConstraintSecurityHandler@759] - ServletContext@o.e.j.s.ServletContextHandler@292b08d6{/,null,STARTING} has uncovered http methods for path: /*
2022-10-14 12:39:42,732 [myid:1] - INFO  [main:ContextHandler@916] - Started o.e.j.s.ServletContextHandler@292b08d6{/,null,AVAILABLE}
2022-10-14 12:39:42,751 [myid:1] - INFO  [main:AbstractConnector@331] - Started ServerConnector@12468a38{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2022-10-14 12:39:42,752 [myid:1] - INFO  [main:Server@415] - Started @651ms
2022-10-14 12:39:42,752 [myid:1] - INFO  [main:JettyAdminServer@181] - Started AdminServer on address 0.0.0.0, port 8080 and command URL /commands
2022-10-14 12:39:42,757 [myid:1] - INFO  [main:ServerCnxnFactory@169] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2022-10-14 12:39:42,758 [myid:1] - WARN  [main:ServerCnxnFactory@309] - maxCnxns is not configured, using default value 0.
2022-10-14 12:39:42,760 [myid:1] - INFO  [main:NIOServerCnxnFactory@652] - Configuring NIO connection handler with 10s sessionless connection timeout, 4 selector thread(s), 80 worker threads, and 64 kB direct buffers.
2022-10-14 12:39:42,761 [myid:1] - INFO  [main:NIOServerCnxnFactory@660] - binding to port /0.0.0.0:2181
2022-10-14 12:39:42,776 [myid:1] - INFO  [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2022-10-14 12:39:42,776 [myid:1] - INFO  [main:WatchManagerFactory@42] - Using org.apache.zookeeper.server.watch.WatchManager as watch manager
2022-10-14 12:39:42,777 [myid:1] - INFO  [main:ZKDatabase@133] - zookeeper.snapshotSizeFactor = 0.33
2022-10-14 12:39:42,777 [myid:1] - INFO  [main:ZKDatabase@153] - zookeeper.commitLogCount=500
2022-10-14 12:39:42,791 [myid:1] - INFO  [main:SnapStream@61] - zookeeper.snapshot.compression.method = CHECKED
2022-10-14 12:39:42,793 [myid:1] - INFO  [main:FileSnap@85] - Reading snapshot /data/version-2/snapshot.ab5e0
2022-10-14 12:39:42,811 [myid:1] - INFO  [main:DataTree@1709] - The digest in the snapshot has digest version of 2, , with zxid as 0xab5e3, and digest value as 2900396925382
2022-10-14 12:39:42,884 [myid:1] - INFO  [main:ZKAuditProvider@42] - ZooKeeper audit is disabled.
2022-10-14 12:39:42,886 [myid:1] - INFO  [main:FileTxnSnapLog@372] - 682 txns loaded in 65 ms
2022-10-14 12:39:42,887 [myid:1] - INFO  [main:ZKDatabase@290] - Snapshot loaded in 109 ms, highest zxid is 0xab88a, digest is 2107719724138
2022-10-14 12:39:42,888 [myid:1] - INFO  [main:FileTxnSnapLog@479] - Snapshotting: 0xab88a to /data/version-2/snapshot.ab88a
2022-10-14 12:39:42,902 [myid:1] - INFO  [main:ZooKeeperServer@543] - Snapshot taken in 13 ms
2022-10-14 12:39:42,911 [myid:1] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@137] - PrepRequestProcessor (sid:0) started, reconfigEnabled=false
2022-10-14 12:39:42,911 [myid:1] - INFO  [main:RequestThrottler@75] - zookeeper.request_throttler.shutdownTimeout = 10000
2022-10-14 12:39:42,929 [myid:1] - INFO  [main:ContainerManager@84] - Using checkIntervalMs=60000 maxPerMinute=10000 maxNeverUsedIntervalMs=0
2022-10-14 12:39:47,503 [myid:1] - WARN  [NIOWorkerThread-5:ZooKeeperServer@1411] - Connection request from old client /172.18.0.1:37816; will be dropped if server is in r-o mode
2022-10-14 12:39:47,505 [myid:1] - INFO  [SyncThread:0:FileTxnLog@284] - Creating new log file: log.ab88b
2022-10-14 12:40:14,460 [myid:1] - INFO  [SessionTracker:ZooKeeperServer@628] - Expiring session 0x1000a77c0340004, timeout of 30000ms exceeded
2022-10-14 12:41:10,877 [myid:1] - WARN  [NIOWorkerThread-58:NIOServerCnxn@371] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /172.18.0.1:37816, session = 0x101c66078cc0000
    at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:170)
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:333)
    at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
    at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
2022-10-14 12:41:12,228 [myid:1] - WARN  [NIOWorkerThread-59:ZooKeeperServer@1411] - Connection request from old client /172.18.0.1:37822; will be dropped if server is in r-o mode
2022-10-14 12:41:36,969 [myid:1] - WARN  [NIOWorkerThread-40:NIOServerCnxn@371] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /172.18.0.1:37822, session = 0x101c66078cc0001
    at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:170)
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:333)
    at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
    at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
2022-10-14 12:41:37,658 [myid:1] - WARN  [NIOWorkerThread-41:ZooKeeperServer@1411] - Connection request from old client /172.18.0.1:37894; will be dropped if server is in r-o mode
2022-10-14 12:41:40,459 [myid:1] - INFO  [SessionTracker:ZooKeeperServer@628] - Expiring session 0x101c66078cc0000, timeout of 30000ms exceeded
2022-10-17 07:05:35,639 [myid:1] - INFO  [Snapshot Thread:FileTxnSnapLog@479] - Snapshotting: 0xaba2c to /data/version-2/snapshot.aba2c
2022-10-17 07:05:35,644 [myid:1] - INFO  [SyncThread:0:FileTxnLog@284] - Creating new log file: log.aba2e
2022-10-17 07:05:35,647 [myid:1] - INFO  [Snapshot Thread:ZooKeeperServer@543] - Snapshot taken in 8 ms
2022-10-17 07:05:44,607 [myid:1] - WARN  [NIOWorkerThread-25:NIOServerCnxn@371] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /172.18.0.1:37894, session = 0x101c66078cc0002
    at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:170)
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:333)
    at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
    at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
2022-10-17 07:05:45,958 [myid:1] - WARN  [NIOWorkerThread-61:ZooKeeperServer@1411] - Connection request from old client /172.18.0.1:47080; will be dropped if server is in r-o mode
2022-10-17 07:06:11,524 [myid:1] - WARN  [NIOWorkerThread-46:NIOServerCnxn@371] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /172.18.0.1:47080, session = 0x101c66078cc0003
    at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:170)
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:333)
    at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
    at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
2022-10-17 07:06:12,325 [myid:1] - WARN  [NIOWorkerThread-68:ZooKeeperServer@1411] - Connection request from old client /172.18.0.1:47150; will be dropped if server is in r-o mode
2022-10-17 07:06:14,460 [myid:1] - INFO  [SessionTracker:ZooKeeperServer@628] - Expiring session 0x101c66078cc0002, timeout of 30000ms exceeded
2022-10-17 07:08:33,467 [myid:1] - WARN  [NIOWorkerThread-79:NIOServerCnxn@371] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /172.18.0.1:47150, session = 0x101c66078cc0004
    at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:170)
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:333)
    at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
    at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
2022-10-17 07:08:34,821 [myid:1] - WARN  [NIOWorkerThread-73:ZooKeeperServer@1411] - Connection request from old client /172.18.0.1:48034; will be dropped if server is in r-o mode
2022-10-17 07:09:00,358 [myid:1] - WARN  [NIOWorkerThread-10:NIOServerCnxn@371] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /172.18.0.1:48034, session = 0x101c66078cc0007
    at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:170)
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:333)
    at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:508)
    at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
2022-10-17 07:09:01,272 [myid:1] - WARN  [NIOWorkerThread-12:ZooKeeperServer@1411] - Connection request from old client /172.18.0.1:48104; will be dropped if server is in r-o mode
2022-10-17 07:09:04,459 [myid:1] - INFO  [SessionTracker:ZooKeeperServer@628] - Expiring session 0x101c66078cc0004, timeout of 30000ms exceeded
guidoiaquinti commented 1 year ago

Are there any ERROR logs in Zookeeper? Can you please verify the dataset in Zookeeper?

ClickHouse table in read only mode usually happens during network issue, ZK xid overflow, ZK restart or no metadata left in ZK. Take a look here.

jandolezal71 commented 1 year ago

Can you please verify the dataset in Zookeeper ?

I don't follow, what and where should i check? I tried to run these command in clickhouse web gui:

SYSTEM RESTART REPLICA posthog.session_recording_events;
SYSTEM RESTORE REPLICA posthog.session_recording_events;

but clickhouse said Code: 36. DB::Exception: Table posthog.session_recording_events (a1e9a369-edf1-44d3-97db-50cde952825a) is not replicated. (BAD_ARGUMENTS) (version 22.3.7.28 (official build)).

And also, you want to tell me i need to do this everytime network issue occurs?

guidoiaquinti commented 1 year ago

I don't follow, what and where should i check?

You should check if the dataset is still in Zookeeper. I have the feeling the data might be not there anymore in your setup.

And also, you want to tell me i need to do this everytime network issue occurs?

On reliable networks this should never occur. On unreliable networks tho, this is true but that's also how ClickHouse works. Zookeeper is a dependency and CH needs it in order to work, otherwise will set tables in RO mode.

posthog-contributions-bot[bot] commented 1 year ago

This issue has 2029 words at 5 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:

  1. Write some code and submit a pull request! Code wins arguments
  2. Have a sync meeting to reach a conclusion
  3. Create a Request for Comments and submit a PR with it to the meta repo or product internal repo

Is this issue intended to be sprawling? Consider adding label epic or sprint to indicate this.

voarsh2 commented 1 year ago

I am also having this same issue, and a number of tables in CH appear readonly:

select database,table from system.replicas where is_readonly;

SELECT
    database,
    table
FROM system.replicas
WHERE is_readonly

Query id: 795f1cc8-77b1-4579-bf1a-5de2ade7795c

┌─database─┬─table────────────────────────────┐
│ posthog  │ cohortpeople                     │
│ posthog  │ events_dead_letter_queue         │
│ posthog  │ groups                           │
│ posthog  │ infi_clickhouse_orm_migrations   │
│ posthog  │ person                           │
│ posthog  │ person_distinct_id               │
│ posthog  │ person_distinct_id2              │
│ posthog  │ person_static_cohort             │
│ posthog  │ plugin_log_entries               │
│ posthog  │ sharded_events                   │
│ posthog  │ sharded_session_recording_events │
└──────────┴──────────────────────────────────┘

11 rows in set. Elapsed: 0.054 sec. 

I was trying to edit the recording TTL that's different than 3 weeks.

I've also tried:

SYSTEM RESTORE REPLICA cohortpeople; 
SYSTEM RESTORE REPLICA events_dead_letter_queue; 
SYSTEM RESTORE REPLICA groups; 
SYSTEM RESTORE REPLICA infi_clickhouse_orm_migrations; 
SYSTEM RESTORE REPLICA person; 
SYSTEM RESTORE REPLICA person_distinct_id; 
SYSTEM RESTORE REPLICA person_distinct_id2; 
SYSTEM RESTORE REPLICA person_static_cohort; 
SYSTEM RESTORE REPLICA plugin_log_entries; 
SYSTEM RESTORE REPLICA sharded_events; 
SYSTEM RESTORE REPLICA sharded_session_recording_events; 

The usual output is:

Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Replica path is present at /clickhouse/tables/0/posthog.session_recording_events/replicas/chi-posthog-posthog-0-0 - nothing to restore. If you are sure that metadata is lost and that replica path contains some garbage, then use SYSTEM DROP REPLICA query first.. (BAD_ARGUMENTS)

Occasionally when running again, I see:

Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Replica must be readonly. (BAD_ARGUMENTS)
chi-posthog-posthog-0-0-0.chi-posthog-posthog-0-0.posthog.svc.cluster.local :) select database,table from system.replicas where is_readonly;

SELECT
    database,
    table
FROM system.replicas
WHERE is_readonly

Query id: 883af365-356c-4608-a990-7f45662de088

┌─database─┬─table────────────────────────────┐
│ posthog  │ sharded_session_recording_events │
└──────────┴──────────────────────────────────┘

1 rows in set. Elapsed: 0.007 sec. 

But now, I am not sure what to do if dataset is not in Zookeeper. FYI, zero errors/output from Zookeeper:

zookeeper 16:40:54.36
--
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.37 Welcome to the Bitnami zookeeper container
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.37 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-zookeeper
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.37 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-zookeeper/issues
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.37
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.37 INFO ==> ** Starting ZooKeeper setup **
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.38 WARN ==> You have set the environment variable ALLOW_ANONYMOUS_LOGIN=yes. For safety reasons, do not use this flag in a production environment.
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.39 INFO ==> Initializing ZooKeeper...
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.40 INFO ==> No injected configuration file found, creating default config files...
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.43 INFO ==> No additional servers were specified. ZooKeeper will run in standalone mode...
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.43 INFO ==> Deploying ZooKeeper with persisted data...
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.43 INFO ==> ** ZooKeeper setup finished! **
Wed, Oct 19 2022 5:40:54 pm |  
Wed, Oct 19 2022 5:40:54 pm | zookeeper 16:40:54.44 INFO ==> ** Starting ZooKeeper **
Wed, Oct 19 2022 5:40:54 pm | /opt/bitnami/java/bin/java
Wed, Oct 19 2022 5:40:54 pm | ZooKeeper JMX enabled by default
Wed, Oct 19 2022 5:40:54 pm | Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Wed, Oct 19 2022 5:40:55 pm | Removing file: Oct 19, 2022, 12:08:14 PM /bitnami/zookeeper/data/version-2/log.8e84e
Wed, Oct 19 2022 5:40:55 pm | Removing file: Oct 19, 2022, 12:08:14 PM /bitnami/zookeeper/data/version-2/snapshot.9197f

Ref: https://app.slack.com/client/TT68H7WHY/C01GLBKHKQT/thread/C01GLBKHKQT-1666199307.600689

voarsh2 commented 1 year ago

FYI, I did a fresh install again and changed the desired recording time to 5 years. I find that again I am unable to change the setting a day later. Kafka and Zookeper have all been online with no restarts since the fresh install - the issue has happened again.