confluentinc / schema-registry

Confluent Schema Registry for Kafka
https://docs.confluent.io/current/schema-registry/docs/index.html
Other
2.23k stars 1.11k forks source link

Error initializing kafka store while initializing schema registry #1252

Open ShowMeYourNoobs opened 4 years ago

ShowMeYourNoobs commented 4 years ago

Hello,

I have configured confluent cluster in docker containers consisting of 3 zookeepers, 3 kafka brokers and 3 schema registries. Sometimes on deploying all 3 schema registries won't start. The problem appears very rarely, only when deploying the cluster (docker stack, using yml file). And when it does appear, the fix is to redeploy the cluster and it works again. I should note that we use 5.1.3 version of confluentinc. In the log bellow you can see that there are several errors which appear. Here is the log I get from 1 of the schema registries (all 3 of them have the same log):

_wait-for-it: waiting for kafka1:9092 without a timeout, wait-for-it: kafka1:9092 is available after 0 seconds, wait-for-it: waiting for kafka2:9092 without a timeout, wait-for-it: kafka2:9092 is available after 0 seconds, wait-for-it: waiting for kafka3:9092 without a timeout, wait-for-it: kafka3:9092 is available after 0 seconds, [main] INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values: , bootstrap.servers = [kafka1:9092], client.dns.lookup = default, client.id = , connections.max.idle.ms = 300000, metadata.max.age.ms = 300000, metric.reporters = [], metrics.num.samples = 2, metrics.recording.level = INFO, metrics.sample.window.ms = 30000, receive.buffer.bytes = 65536, reconnect.backoff.max.ms = 1000, reconnect.backoff.ms = 50, request.timeout.ms = 120000, retries = 5, retry.backoff.ms = 100, sasl.client.callback.handler.class = null, sasl.jaas.config = null, sasl.kerberos.kinit.cmd = /usr/bin/kinit, sasl.kerberos.min.time.before.relogin = 60000, sasl.kerberos.service.name = null, sasl.kerberos.ticket.renew.jitter = 0.05, sasl.kerberos.ticket.renew.window.factor = 0.8, sasl.login.callback.handler.class = null, sasl.login.class = null, sasl.login.refresh.buffer.seconds = 300, sasl.login.refresh.min.period.seconds = 60, sasl.login.refresh.window.factor = 0.8, sasl.login.refresh.window.jitter = 0.05, sasl.mechanism = GSSAPI, security.protocol = PLAINTEXT, send.buffer.bytes = 131072, ssl.cipher.suites = null, ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1], ssl.endpoint.identification.algorithm = https, ssl.key.password = null, ssl.keymanager.algorithm = SunX509, ssl.keystore.location = null, ssl.keystore.password = null, ssl.keystore.type = JKS, ssl.protocol = TLS, ssl.provider = null, ssl.secure.random.implementation = null, ssl.trustmanager.algorithm = PKIX, ssl.truststore.location = null, ssl.truststore.password = null, ssl.truststore.type = JKS, , [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.1-cp3, [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : c119bfeabe023775, [main] INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values: , bootstrap.servers = [kafka2:9092], client.dns.lookup = default, client.id = , connections.max.idle.ms = 300000, metadata.max.age.ms = 300000, metric.reporters = [], metrics.num.samples = 2, metrics.recording.level = INFO, metrics.sample.window.ms = 30000, receive.buffer.bytes = 65536, reconnect.backoff.max.ms = 1000, reconnect.backoff.ms = 50, request.timeout.ms = 120000, retries = 5, retry.backoff.ms = 100, sasl.client.callback.handler.class = null, sasl.jaas.config = null, sasl.kerberos.kinit.cmd = /usr/bin/kinit, sasl.kerberos.min.time.before.relogin = 60000, sasl.kerberos.service.name = null, sasl.kerberos.ticket.renew.jitter = 0.05, sasl.kerberos.ticket.renew.window.factor = 0.8, sasl.login.callback.handler.class = null, sasl.login.class = null, sasl.login.refresh.buffer.seconds = 300, sasl.login.refresh.min.period.seconds = 60, sasl.login.refresh.window.factor = 0.8, sasl.login.refresh.window.jitter = 0.05, sasl.mechanism = GSSAPI, security.protocol = PLAINTEXT, send.buffer.bytes = 131072, ssl.cipher.suites = null, ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1], ssl.endpoint.identification.algorithm = https, ssl.key.password = null, ssl.keymanager.algorithm = SunX509, ssl.keystore.location = null, ssl.keystore.password = null, ssl.keystore.type = JKS, ssl.protocol = TLS, ssl.provider = null, ssl.secure.random.implementation = null, ssl.trustmanager.algorithm = PKIX, ssl.truststore.location = null, ssl.truststore.password = null, ssl.truststore.type = JKS, , [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.1-cp3, [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : c119bfeabe023775, [main] INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values: , bootstrap.servers = [kafka3:9092], client.dns.lookup = default, client.id = , connections.max.idle.ms = 300000, metadata.max.age.ms = 300000, metric.reporters = [], metrics.num.samples = 2, metrics.recording.level = INFO, metrics.sample.window.ms = 30000, receive.buffer.bytes = 65536, reconnect.backoff.max.ms = 1000, reconnect.backoff.ms = 50, request.timeout.ms = 120000, retries = 5, retry.backoff.ms = 100, sasl.client.callback.handler.class = null, sasl.jaas.config = null, sasl.kerberos.kinit.cmd = /usr/bin/kinit, sasl.kerberos.min.time.before.relogin = 60000, sasl.kerberos.service.name = null, sasl.kerberos.ticket.renew.jitter = 0.05, sasl.kerberos.ticket.renew.window.factor = 0.8, sasl.login.callback.handler.class = null, sasl.login.class = null, sasl.login.refresh.buffer.seconds = 300, sasl.login.refresh.min.period.seconds = 60, sasl.login.refresh.window.factor = 0.8, sasl.login.refresh.window.jitter = 0.05, sasl.mechanism = GSSAPI, security.protocol = PLAINTEXT, send.buffer.bytes = 131072, ssl.cipher.suites = null, ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1], ssl.endpoint.identification.algorithm = https, ssl.key.password = null, ssl.keymanager.algorithm = SunX509, ssl.keystore.location = null, ssl.keystore.password = null, ssl.keystore.type = JKS, ssl.protocol = TLS, ssl.provider = null, ssl.secure.random.implementation = null, ssl.trustmanager.algorithm = PKIX, ssl.truststore.location = null, ssl.truststore.password = null, ssl.truststore.type = JKS, , [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.1-cp3, [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : c119bfeabe023775, [kafka-admin-client-thread | adminclient-1] INFO org.apache.kafka.clients.admin.internals.AdminMetadataManager - [AdminClient clientId=adminclient-1] Metadata update failed, org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call., [main] ERROR io.confluent.admin.utils.ClusterStatus - Error while getting broker list., java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment., at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45), at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32), at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89), at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262), at io.confluent.admin.utils.ClusterStatus.isKafkaReady(ClusterStatus.java:149), at io.confluent.admin.utils.cli.KafkaReadyCommand.main(KafkaReadyCommand.java:150), Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment., [main] INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again ..., [main] ERROR io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Brokers found []., OK. All Kafkas are ready, ===> ENV Variables ..., ALLOW_UNSIGNED=false, COMPONENT=schema-registry, CONFLUENT_DEB_VERSION=1, CONFLUENT_MAJOR_VERSION=5, CONFLUENT_MINOR_VERSION=1, CONFLUENT_MVN_LABEL=, CONFLUENT_PATCH_VERSION=3, CONFLUENT_PLATFORM_LABEL=, CONFLUENT_VERSION=5.1.3, CUB_CLASSPATH=/etc/confluent/docker/docker-utils.jar, HOME=/root, HOSTNAME=704d321e9067, KAFKA_VERSION=2.1.1cp3, LANG=C.UTF-8, PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin, PWD=/, PYTHON_PIP_VERSION=8.1.2, PYTHON_VERSION=2.7.9-1, SCALA_VERSION=2.11, SCHEMA_REGISTRY_AVRO_COMPATIBILITY_LEVEL=none, SCHEMA_REGISTRY_HOST_NAME=schema-registry2, SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS=PLAINTEXT://kafka1:9092,PLAINTEXT://kafka2:9092,PLAINTEXT://kafka3:9092, SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081, SCHEMA_REGISTRY_ZK_NAMESPACE=sr, SHLVL=2, ZULU_OPENJDKVERSION=8=8.30.0.1, \=/usr/bin/env,, r, uid=0(root) gid=0(root) groups=0(root), ===> Configuring ..., ===> Running preflight checks ... , ===> Check if Kafka is healthy ..., [main] INFO org.apache.kafka.clients.admin.AdminClientConfig - AdminClientConfig values: , bootstrap.servers = [PLAINTEXT://kafka1:9092, PLAINTEXT://kafka2:9092, PLAINTEXT://kafka3:9092], client.dns.lookup = default, client.id = , connections.max.idle.ms = 300000, metadata.max.age.ms = 300000, metric.reporters = [], metrics.num.samples = 2, metrics.recording.level = INFO, metrics.sample.window.ms = 30000, receive.buffer.bytes = 65536, reconnect.backoff.max.ms = 1000, reconnect.backoff.ms = 50, request.timeout.ms = 120000, retries = 5, retry.backoff.ms = 100, sasl.client.callback.handler.class = null, sasl.jaas.config = null, sasl.kerberos.kinit.cmd = /usr/bin/kinit, sasl.kerberos.min.time.before.relogin = 60000, sasl.kerberos.service.name = null, sasl.kerberos.ticket.renew.jitter = 0.05, sasl.kerberos.ticket.renew.window.factor = 0.8, sasl.login.callback.handler.class = null, sasl.login.class = null, sasl.login.refresh.buffer.seconds = 300, sasl.login.refresh.min.period.seconds = 60, sasl.login.refresh.window.factor = 0.8, sasl.login.refresh.window.jitter = 0.05, sasl.mechanism = GSSAPI, security.protocol = PLAINTEXT, send.buffer.bytes = 131072, ssl.cipher.suites = null, ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1], ssl.endpoint.identification.algorithm = https, ssl.key.password = null, ssl.keymanager.algorithm = SunX509, ssl.keystore.location = null, ssl.keystore.password = null, ssl.keystore.type = JKS, ssl.protocol = TLS, ssl.provider = null, ssl.secure.random.implementation = null, ssl.trustmanager.algorithm = PKIX, ssl.truststore.location = null, ssl.truststore.password = null, ssl.truststore.type = JKS, , [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.1-cp3, [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : c119bfeabe023775, ===> Launching ... , ===> Launching schema-registry ... , [2019-12-05 07:29:35,054] INFO SchemaRegistryConfig values: , resource.extension.class = [], metric.reporters = [], kafkastore.sasl.kerberos.kinit.cmd = /usr/bin/kinit, response.mediatype.default = application/vnd.schemaregistry.v1+json, resource.extension.classes = [], kafkastore.ssl.trustmanager.algorithm = PKIX, inter.instance.protocol = http, authentication.realm = , ssl.keystore.type = JKS, kafkastore.topic = _schemas, metrics.jmx.prefix = kafka.schema.registry, kafkastore.ssl.enabled.protocols = TLSv1.2,TLSv1.1,TLSv1, kafkastore.topic.replication.factor = 3, ssl.truststore.password = [hidden], kafkastore.timeout.ms = 500, host.name = schema-registry2, kafkastore.bootstrap.servers = [PLAINTEXT://kafka1:9092, PLAINTEXT://kafka2:9092, PLAINTEXT://kafka3:9092], schema.registry.zk.namespace = schemaregistry, kafkastore.sasl.kerberos.ticket.renew.window.factor = 0.8, kafkastore.sasl.kerberos.service.name = , schema.registry.resource.extension.class = [], ssl.endpoint.identification.algorithm = , compression.enable = true, kafkastore.ssl.truststore.type = JKS, avro.compatibility.level = none, kafkastore.ssl.protocol = TLS, kafkastore.ssl.provider = , kafkastore.ssl.truststore.location = , response.mediatype.preferred = [application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json], kafkastore.ssl.keystore.type = JKS, authentication.skip.paths = [], ssl.truststore.type = JKS, websocket.servlet.initializor.classes = [], kafkastore.ssl.truststore.password = [hidden], access.control.allow.origin = , ssl.truststore.location = , ssl.keystore.password = [hidden], port = 8081, access.control.allow.headers = , kafkastore.ssl.keystore.location = , metrics.tag.map = {}, master.eligibility = true, ssl.client.auth = false, kafkastore.ssl.keystore.password = [hidden], rest.servlet.initializor.classes = [], websocket.path.prefix = /ws, kafkastore.security.protocol = PLAINTEXT, ssl.trustmanager.algorithm = , authentication.method = NONE, request.logger.name = io.confluent.rest-utils.requests, ssl.key.password = [hidden], kafkastore.zk.session.timeout.ms = 30000, kafkastore.sasl.mechanism = GSSAPI, kafkastore.sasl.kerberos.ticket.renew.jitter = 0.05, kafkastore.ssl.key.password = [hidden], zookeeper.set.acl = false, schema.registry.inter.instance.protocol = , authentication.roles = [*], metrics.num.samples = 2, ssl.protocol = TLS, schema.registry.group.id = schema-registry, kafkastore.ssl.keymanager.algorithm = SunX509, kafkastore.connection.url = , debug = false, listeners = [http://0.0.0.0:8081], kafkastore.group.id = , ssl.provider = , ssl.enabled.protocols = [], shutdown.graceful.ms = 1000, ssl.keystore.location = , ssl.cipher.suites = [], kafkastore.ssl.endpoint.identification.algorithm = , kafkastore.ssl.cipher.suites = , access.control.allow.methods = , kafkastore.sasl.kerberos.min.time.before.relogin = 60000, ssl.keymanager.algorithm = , metrics.sample.window.ms = 30000, kafkastore.init.timeout.ms = 60000, (io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig), [2019-12-05 07:29:35,100] INFO Logging initialized @353ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log), [2019-12-05 07:29:35,373] INFO Initializing KafkaStore with broker endpoints: PLAINTEXT://kafka1:9092,PLAINTEXT://kafka2:9092,PLAINTEXT://kafka3:9092 (io.confluent.kafka.schemaregistry.storage.KafkaStore), [2019-12-05 07:29:35,388] INFO AdminClientConfig values: , bootstrap.servers = [PLAINTEXT://kafka1:9092, PLAINTEXT://kafka2:9092, PLAINTEXT://kafka3:9092], client.dns.lookup = default, client.id = , connections.max.idle.ms = 300000, metadata.max.age.ms = 300000, metric.reporters = [], metrics.num.samples = 2, metrics.recording.level = INFO, metrics.sample.window.ms = 30000, receive.buffer.bytes = 65536, reconnect.backoff.max.ms = 1000, reconnect.backoff.ms = 50, request.timeout.ms = 120000, retries = 5, retry.backoff.ms = 100, sasl.client.callback.handler.class = null, sasl.jaas.config = null, sasl.kerberos.kinit.cmd = /usr/bin/kinit, sasl.kerberos.min.time.before.relogin = 60000, sasl.kerberos.service.name = null, sasl.kerberos.ticket.renew.jitter = 0.05, sasl.kerberos.ticket.renew.window.factor = 0.8, sasl.login.callback.handler.class = null, sasl.login.class = null, sasl.login.refresh.buffer.seconds = 300, sasl.login.refresh.min.period.seconds = 60, sasl.login.refresh.window.factor = 0.8, sasl.login.refresh.window.jitter = 0.05, sasl.mechanism = GSSAPI, security.protocol = PLAINTEXT, send.buffer.bytes = 131072, ssl.cipher.suites = null, ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1], ssl.endpoint.identification.algorithm = https, ssl.key.password = null, ssl.keymanager.algorithm = SunX509, ssl.keystore.location = null, ssl.keystore.password = null, ssl.keystore.type = JKS, ssl.protocol = TLS, ssl.provider = null, ssl.secure.random.implementation = null, ssl.trustmanager.algorithm = PKIX, ssl.truststore.location = null, ssl.truststore.password = null, ssl.truststore.type = JKS, (org.apache.kafka.clients.admin.AdminClientConfig), [2019-12-05 07:29:35,451] INFO Kafka version : 2.1.1-cp3 (org.apache.kafka.common.utils.AppInfoParser), [2019-12-05 07:29:35,451] INFO Kafka commitId : 74487c928927908f (org.apache.kafka.common.utils.AppInfoParser), [2019-12-05 07:34:35,640] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication), io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry, at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:212), at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:62), at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:73), at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:40), at io.confluent.rest.Application.createServer(Application.java:201), at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:42), Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: Timed out trying to create or validate schema topic configuration, at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:172), at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:114), at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:210), ... 5 more, Caused by: java.util.concurrent.TimeoutException, at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108), at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:274), at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:165), ... 7 more,

OneCricketeer commented 4 years ago

Have you set the registry containers to depend on the startup of zookeeper & kafka containers?

ShowMeYourNoobs commented 4 years ago

Yes. Kafka containers depend on zookeeper containers and schema registry containers depend on kafka containers. Here is the start script which we call from the dockerfile (in CMD section) of schema registry:

!/bin/bash

wait-for-it -h kafka1 -p 9092 -t 0 wait-for-it -h kafka2 -p 9092 -t 0 wait-for-it -h kafka3 -p 9092 -t 0

cub kafka-ready -b kafka1:9092 1 120 cub kafka-ready -b kafka2:9092 1 120 cub kafka-ready -b kafka3:9092 1 120

echo "OK. All Kafkas are ready"

./etc/confluent/docker/run

I use the wait-for-it script to check if the kafka container is up and I use the cub command to check if the actual kafka is up and running.

pandalanax commented 2 years ago

Any updates?

ShowMeYourNoobs commented 2 years ago

It has been some time ago, but as far as I remember we had a problem in 1 or more of our wait-for-it scripts. I don't know what the problem was, maybe miss-configuration or something else. After fixing that, all was working fine every time we deployed.