confluentinc / cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
https://cnfl.io/getting-started-kafka-kubernetes
Apache License 2.0
789 stars 846 forks source link

when replicas and server set to 1 : cp-schema-registry-server CrashLoopBackOff #509

Open survivant opened 3 years ago

survivant commented 3 years ago

I'm deploying the platform to test it and I have pods that are crashing in loop

I'm using this version :

- name: cp-helm-charts
    version: 0.6.0
    repository: "@confluentinc"
    condition: kafka.enabled
    alias: kafka

values.yaml

kafka:
  enabled: true

  cp-zookeeper:
    servers: 1
    persistence:
      enabled: false

  cp-kafka:
    brokers: 1
    persistence:
      enabled: false

  cp-kafka-connect:
    configurationOverrides:
      "config.storage.replication.factor": "1"
      "offset.storage.replication.factor": "1"
      "status.storage.replication.factor": "1"

here my pods running

kafka-cp-control-center-84ddf9f746-8dsx8                   1/1     Running            3          19m
kafka-cp-kafka-0                                           2/2     Running            0          19m
kafka-cp-kafka-connect-59ddf47868-2nfzt                    2/2     Running            0          19m
kafka-cp-kafka-rest-685dc8fd7-pc4k7                        2/2     Running            1          19m
kafka-cp-ksql-server-685f845d-grn9c                        2/2     Running            0          19m
kafka-cp-schema-registry-5d699b7f4d-6jqwd                  1/2     CrashLoopBackOff   6          19m
kafka-cp-zookeeper-0                                       2/2     Running            0          19m

part of the logs for : kubectl logs kafka-cp-schema-registry-5d699b7f4d-6jqwd cp-schema-registry-server


[2021-04-07 17:40:33,830] INFO Wait to catch up until the offset at 3 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2021-04-07 17:40:34,328] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2021-04-07 17:40:34,431] INFO Kafka version: 6.0.1-ce (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-07 17:40:34,431] INFO Kafka commitId: 5e516110bd85c6e3 (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-07 17:40:34,431] INFO Kafka startTimeMs: 1617817234431 (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-07 17:40:34,540] INFO [Schema registry clientId=sr-1, groupId=schema-registry] Cluster ID: Se2sXcG_RO2dCt5F-aM_Hw (org.apache.kafka.clients.Metadata)
[2021-04-07 17:41:34,531] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
        at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:308)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:73)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:88)
        at io.confluent.rest.Application.configureHandler(Application.java:254)
        at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:196)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
        at io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector.init(KafkaGroupLeaderElector.java:210)
        at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:303)
        ... 6 more
root@test-pcl4004:~#
OneCricketeer commented 3 years ago
cp-schema-registry:
    enabled: true
    configurationOverrides:
      "kafkastore.topic.replication.factor" : 1
survivant commented 3 years ago

@OneCricketeer look like it doesn't work. The pod is still a CrashLoopBackOff

We see this warning : WARN The configuration 'topic.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)

[2021-04-08 12:09:58,174] WARN The configuration 'group.id' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
[2021-04-08 12:09:58,175] WARN The configuration 'topic.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.producer.ProducerConfig)
[2021-04-08 12:09:58,268] INFO Kafka version: 6.0.1-ce (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 12:09:58,268] INFO Kafka commitId: 5e516110bd85c6e3 (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 12:09:58,268] INFO Kafka startTimeMs: 1617883798268 (org.apache.kafka.common.utils.AppInfoParser)
OneCricketeer commented 3 years ago

That's a warning, and can be ignored, so not the error for the container crashing

survivant commented 3 years ago

ok. but the error return to this

[2021-04-07 17:40:33,830] INFO Wait to catch up until the offset at 3 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2021-04-07 17:40:34,328] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2021-04-07 17:40:34,431] INFO Kafka version: 6.0.1-ce (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-07 17:40:34,431] INFO Kafka commitId: 5e516110bd85c6e3 (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-07 17:40:34,431] INFO Kafka startTimeMs: 1617817234431 (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-07 17:40:34,540] INFO [Schema registry clientId=sr-1, groupId=schema-registry] Cluster ID: Se2sXcG_RO2dCt5F-aM_Hw (org.apache.kafka.clients.Metadata)
[2021-04-07 17:41:34,531] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
        at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:308)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:73)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:88)
        at io.confluent.rest.Application.configureHandler(Application.java:254)
        at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:196)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
        at io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector.init(KafkaGroupLeaderElector.java:210)
        at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:303)
        ... 6 more
root@test-pcl4004:~#

[2021-04-07 17:40:33,830] INFO Wait to catch up until the offset at 3 (io.confluent.kafka.schemaregistry.storage.KafkaStore)

io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:308)

OneCricketeer commented 3 years ago

That's a timeout exception, and saying the _schemas topic has at least 3 records on it, meaning the solution I gave fixed your first error

survivant commented 3 years ago

it's wierd. if I use 3.. schema registry doesn't crash

OneCricketeer commented 3 years ago

I can't really say why that would be, other than a race condition, but to solve timeouts, you can increase both kafkastore.timeout.ms and kafkastore.init.timeout.ms

survivant commented 3 years ago

I added more RAM, more CPU to schema registry and I still get this

[2021-04-08 13:54:14,750] INFO Kafka version: 6.1.0-ce (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 13:54:14,750] INFO Kafka commitId: 958ad0f3c7030f1c (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 13:54:14,750] INFO Kafka startTimeMs: 1617890054750 (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 13:54:14,771] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=kafka] Cluster ID: wEwA3HejQkyvS6upfPlteg (org.apache.kafka.clients.Metadata)
[2021-04-08 13:54:14,773] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=kafka] Subscribed to partition(s): _schemas-0 (org.apache.kafka.clients.consumer.KafkaConsumer)
[2021-04-08 13:54:14,776] INFO Seeking to beginning for all partitions (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2021-04-08 13:54:14,777] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=kafka] Seeking to EARLIEST offset of partition _schemas-0 (org.apache.kafka.clients.consumer.internals.SubscriptionState)
[2021-04-08 13:54:14,777] INFO Initialized last consumed offset to -1 (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2021-04-08 13:54:14,779] INFO [kafka-store-reader-thread-_schemas]: Starting (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2021-04-08 13:54:14,868] INFO [Consumer clientId=KafkaStore-reader-_schemas, groupId=kafka] Resetting offset for partition _schemas-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka-cp-kafka-1.kafka-cp-kafka-headless.default:9092 (id: 1 rack: null)], epoch=0}}. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
[2021-04-08 13:54:15,006] INFO Wait to catch up until the offset at 2 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2021-04-08 13:54:15,033] INFO Reached offset at 2 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2021-04-08 13:54:15,034] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2021-04-08 13:54:15,080] INFO Kafka version: 6.1.0-ce (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 13:54:15,080] INFO Kafka commitId: 958ad0f3c7030f1c (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 13:54:15,080] INFO Kafka startTimeMs: 1617890055080 (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-08 13:54:15,093] INFO [Schema registry clientId=sr-1, groupId=schema-registry] Cluster ID: wEwA3HejQkyvS6upfPlteg (org.apache.kafka.clients.Metadata)
[2021-04-08 13:55:15,084] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
        at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:319)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:73)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:88)
        at io.confluent.rest.Application.configureHandler(Application.java:255)
        at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:227)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
        at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
        at io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector.init(KafkaGroupLeaderElector.java:212)
        at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:314)

I'll try the timeout settings

winterelf commented 2 years ago

you can add

configurationOverrides: 
    "schema.registry.group.id": "something"

This fixed it for me