apache / bookkeeper

Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
https://bookkeeper.apache.org/
Apache License 2.0
1.91k stars 904 forks source link

InvalidCookieException #4155

Open jihwong opened 11 months ago

jihwong commented 11 months ago

BUG REPORT

Describe the bug

Error reported after restarting the bookkeeper node. Due to being a Kubernetes environment, it was successfully started after being restarted x times. Why is there a new instanceId df26468b-558a-41fc-8705-3d7dc21d2817

log

2023-12-12T10:03:04,513 - INFO  - [main-EventThread:ZooKeeperWatcherBase@130] - ZooKeeper client is connected now.
2023-12-12T10:03:04,650 - ERROR - [main:Main@218] - Failed to build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: instanceId df26468b-558a-41fc-8705-3d7dc21d2817 is not matching with 67c327e1-d3d6-4e28-b731-85a9ae63920b
    at org.apache.bookkeeper.bookie.Cookie.verifyInternal(Cookie.java:168) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
    at org.apache.bookkeeper.bookie.Cookie.verify(Cookie.java:173) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
    at org.apache.bookkeeper.bookie.LegacyCookieValidation.verifyAndGetMissingDirs(LegacyCookieValidation.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
    at org.apache.bookkeeper.bookie.LegacyCookieValidation.checkCookies(LegacyCookieValidation.java:84) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
    at org.apache.bookkeeper.server.EmbeddedServer$Builder.build(EmbeddedServer.java:408) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
    at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:277) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
    at org.apache.bookkeeper.server.Main.doMain(Main.java:216) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
    at org.apache.bookkeeper.server.Main.main(Main.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.2.jar:4.16.2]
root@ubuntu:~# k get pod -n pulsar 
NAME                                     READY   STATUS      RESTARTS      AGE
bookie-sts-0                             1/1     Running     0             16h
bookie-sts-1                             1/1     Running     5 (16h ago)   16h
bookie-sts-2                             1/1     Running     0             16h

zookeeper

[zk: localhost:2181(CONNECTED) 117] get /ledgers/cookies/bookie-sts-1.bookie-svc.pulsar.svc.cluster.local:3181
5
bookieHost: "bookie-sts-1.bookie-svc.pulsar.svc.cluster.local:3181"
journalDir: "/bookkeeper/data/journal"
ledgerDirs: "1\t/bookkeeper/data/ledgers"
instanceId: "67c327e1-d3d6-4e28-b731-85a9ae63920b"
indexDirs: "1\t/bookkeeper/data/ledgers"

data dir

root@bookie-sts-0:/bookkeeper/data# cat ledgers/current/VERSION 
5
bookieHost: "bookie-sts-0.bookie-svc.pulsar.svc.cluster.local:3181"
journalDir: "/bookkeeper/data/journal"
ledgerDirs: "1\t/bookkeeper/data/ledgers"
instanceId: "67c327e1-d3d6-4e28-b731-85a9ae63920b"
indexDirs: "1\t/bookkeeper/data/ledgers"
root@bookie-sts-0:/bookkeeper/data# cat journal/current/VERSION 
5
bookieHost: "bookie-sts-0.bookie-svc.pulsar.svc.cluster.local:3181"
journalDir: "/bookkeeper/data/journal"
ledgerDirs: "1\t/bookkeeper/data/ledgers"
instanceId: "67c327e1-d3d6-4e28-b731-85a9ae63920b"
indexDirs: "1\t/bookkeeper/data/ledgers"

To Reproduce

Steps to reproduce the behavior:

  1. Restart the bookkeeper node

Expected behavior

Screenshots

Additional context

  1. kubernetes environment
  2. image: apache/bookkeeper:4.16.2
jihwong commented 11 months ago

I configured useHostNameAsBookieID=true, but it seems to be unrelated to the issue.

zwOvO commented 10 months ago

Assign a separate /bookkeeper/data directory for each k8s stat

such: /bookkeeper/bookie-sts-0-data /bookkeeper/bookie-sts-1-data /bookkeeper/bookie-sts-2-data

Ghatage commented 9 months ago

AFAIK InstanceId is only set when a new cluster is instantiated, not during restarts.

So you either have an installation which has an existing instance id in the registration manager, or your k8s init command is attempting to reinitialize the cluster.

Can you please check the logs for this line Initializing ZooKeeper metadata for new cluster. It should indicate a new cluster id is being set.

cc @eolivelli

shoothzj commented 6 months ago

closed by no updates, feel free to reopen it if you have new informations or way to reproduce.

leiless commented 2 months ago
2024-09-05T11:48:46,431+0800 [main] INFO  org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2024-09-05T11:48:46,597+0800 [main] INFO  org.apache.zookeeper.common.X509Util - Default TLS protocol is TLSv1.3, supported TLS protocols are [TLSv1.3, TLSv1.2, TLSv1.1, TLSv1, SSLv3, SSLv2Hello]
2024-09-05T11:48:46,604+0800 [main] INFO  org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 10485760 Bytes
2024-09-05T11:48:46,611+0800 [main] INFO  org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=false
2024-09-05T11:48:46,619+0800 [main-SendThread(shanghai:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server shanghai/172.21.0.1:2181.
2024-09-05T11:48:46,619+0800 [main-SendThread(shanghai:2181)] INFO  org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2024-09-05T11:48:46,627+0800 [main-SendThread(shanghai:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /172.21.0.4:55006, server: shanghai/172.21.0.1:2181
2024-09-05T11:48:46,643+0800 [main-SendThread(shanghai:2181)] INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server shanghai/172.21.0.1:2181, session id = 0x10003bc3feb0040, negotiated timeout = 30000
2024-09-05T11:48:46,647+0800 [main-EventThread] INFO  org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.
2024-09-05T11:48:46,718+0800 [main] INFO  org.apache.bookkeeper.bookie.UncleanShutdownDetectionImpl - Dirty files exist on boot-up indicating an unclean shutdown. Dirty files: /data/bookkeeper/ledgers/current/DIRTY
2024-09-05T11:48:46,718+0800 [main] INFO  org.apache.bookkeeper.server.EmbeddedServer$Builder - Unclean shutdown detected. The bookie did not register a graceful shutdown prior to this boot.
2024-09-05T11:48:46,855+0800 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: instanceId 84167942-9ab9-4c54-a8fd-8e3179417f97 is not matching with null
        at org.apache.bookkeeper.bookie.Cookie.verifyInternal(Cookie.java:168) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
        at org.apache.bookkeeper.bookie.Cookie.verify(Cookie.java:173) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
        at org.apache.bookkeeper.bookie.LegacyCookieValidation.readAndVerifyCookieFromRegistrationManager(LegacyCookieValidation.java:179) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
        at org.apache.bookkeeper.bookie.LegacyCookieValidation.checkCookies(LegacyCookieValidation.java:72) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
        at org.apache.bookkeeper.server.EmbeddedServer$Builder.build(EmbeddedServer.java:408) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
        at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:277) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
        at org.apache.bookkeeper.server.Main.doMain(Main.java:216) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
        at org.apache.bookkeeper.server.Main.main(Main.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]

I encounter the same issue here, I'm using Pulsar LTS 3.0.6