confluentinc / schema-registry

Confluent Schema Registry for Kafka
https://docs.confluent.io/current/schema-registry/docs/index.html
Other
2.23k stars 1.11k forks source link

Error: Schema Registry failed to start #1708

Open iguenkinrl opened 3 years ago

iguenkinrl commented 3 years ago

running a confluent 6 on a single node in development environment after confluent local destroy got following output

~/confluent$ confluent local services start
The local commands are intended for a single-node development environment only,
NOT for production usage. https://docs.confluent.io/current/cli/index.html

Using CONFLUENT_CURRENT: /tmp/confluent.646702
Starting ZooKeeper
ZooKeeper is [UP]
Starting Kafka
Kafka is [UP]
Starting Schema Registry
Error: Schema Registry failed to start

log showing timeout message 2020-12-13 01:22:41,214] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry:292) [2020-12-13 01:23:41,230] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:75) io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:308) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:73) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:88) at io.confluent.rest.Application.configureHandler(Application.java:254) at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:196) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43) Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete at io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector.init(KafkaGroupLeaderElector.java:210) at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:303) tried to change kafkastore.topic value for a new topic, same result

Dverhun commented 3 years ago

I have the same problem with Confluent Platform 5.4.2

Stand Alone Schema Registry can not start after maintenance restart, otherwise Cluster where 2+ nodes works properly

[2021-02-12 07:21:05,174] INFO SchemaRegistryConfig values: 
    access.control.allow.headers = 
    access.control.allow.methods = 
    access.control.allow.origin = 
    authentication.method = NONE
    authentication.realm = 
    authentication.roles = [*]
    authentication.skip.paths = []
    avro.compatibility.level = backward
    compression.enable = true
    debug = true
    host.name = registry-1.kafka
    idle.timeout.ms = 30000
    inter.instance.headers.whitelist = []
    inter.instance.protocol = https
    kafkastore.bootstrap.servers = [broker-1:9093, broker-2:9093, broker-3:9093]
    kafkastore.connection.url = 
    kafkastore.group.id = 
    kafkastore.init.timeout.ms = 60000
    kafkastore.sasl.kerberos.kinit.cmd = /usr/bin/kinit
    kafkastore.sasl.kerberos.min.time.before.relogin = 60000
    kafkastore.sasl.kerberos.service.name = kafka
    kafkastore.sasl.kerberos.ticket.renew.jitter = 0.05
    kafkastore.sasl.kerberos.ticket.renew.window.factor = 0.8
    kafkastore.sasl.mechanism = GSSAPI
    kafkastore.security.protocol = SASL_SSL
    kafkastore.ssl.cipher.suites = 
    kafkastore.ssl.enabled.protocols = TLSv1.2,TLSv1.1,TLSv1
    kafkastore.ssl.endpoint.identification.algorithm = 
    kafkastore.ssl.key.password = [hidden]
    kafkastore.ssl.keymanager.algorithm = SunX509
    kafkastore.ssl.keystore.location = /ssl/schema_registry.keystore.jks
    kafkastore.ssl.keystore.password = [hidden]
    kafkastore.ssl.keystore.type = JKS
    kafkastore.ssl.protocol = TLS
    kafkastore.ssl.provider = 
    kafkastore.ssl.trustmanager.algorithm = PKIX
    kafkastore.ssl.truststore.location = /ssl/schema_registry.truststore.jks
    kafkastore.ssl.truststore.password = [hidden]
    kafkastore.ssl.truststore.type = JKS
    kafkastore.timeout.ms = 500
    kafkastore.topic = _schemas
    kafkastore.topic.replication.factor = 3
    kafkastore.write.max.retries = 5
    kafkastore.zk.session.timeout.ms = 30000
    listeners = [https://0.0.0.0:8081]
    master.eligibility = true
    metric.reporters = []
    metrics.jmx.prefix = kafka.schema.registry
    metrics.num.samples = 2
    metrics.sample.window.ms = 30000
    metrics.tag.map = []
    mode.mutability = false
    port = 8081
    request.logger.name = io.confluent.rest-utils.requests
    resource.extension.class = []
    resource.extension.classes = []
    resource.static.locations = []
    response.mediatype.default = application/vnd.schemaregistry.v1+json
    response.mediatype.preferred = [application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json]
    rest.servlet.initializor.classes = []
    schema.registry.group.id = schema-registry
    schema.registry.inter.instance.protocol = 
    schema.registry.resource.extension.class = []
    schema.registry.zk.namespace = schema_registry
    shutdown.graceful.ms = 1000
    ssl.cipher.suites = []
    ssl.client.auth = false
    ssl.client.authentication = NONE
    ssl.enabled.protocols = []
    ssl.endpoint.identification.algorithm = null
    ssl.key.password = [hidden]
    ssl.keymanager.algorithm = 
    ssl.keystore.location = /ssl/schema_registry.keystore.jks
    ssl.keystore.password = [hidden]
    ssl.keystore.type = JKS
    ssl.protocol = TLS
    ssl.provider = 
    ssl.trustmanager.algorithm = 
    ssl.truststore.location = /ssl/schema_registry.truststore.jks
    ssl.truststore.password = [hidden]
    ssl.truststore.type = JKS
    websocket.path.prefix = /ws
    websocket.servlet.initializor.classes = []
    zookeeper.set.acl = false
 (io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig)
[2021-02-12 07:21:05,241] INFO Logging initialized @1038ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log)
[2021-02-12 07:21:05,344] WARN The configuration ssl.client.auth is deprecated and should be replaced with ssl.client.authentication (io.confluent.rest.ApplicationServer)
[2021-02-12 07:21:05,367] INFO Adding listener: https://0.0.0.0:8081 (io.confluent.rest.ApplicationServer)
[2021-02-12 07:21:06,162] INFO Initializing KafkaStore with broker endpoints: SASL_SSL://broker-1:9093,SASL_SSL://broker-2:9093,SASL_SSL://broker-3:9093 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2021-02-12 07:21:09,406] INFO Validating schemas topic _schemas (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2021-02-12 07:21:10,542] INFO Kafka store reader thread starting consumer (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2021-02-12 07:21:11,404] INFO Initialized last consumed offset to -1 (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2021-02-12 07:21:11,407] INFO [kafka-store-reader-thread-_schemas]: Starting (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread)
[2021-02-12 07:21:11,541] INFO Wait to catch up until the offset at 71 (io.confluent.kafka.schemaregistry.storage.KafkaStore)
[2021-02-12 07:21:12,326] INFO Joining schema registry with Kafka-based coordination (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry)
[2021-02-12 07:22:12,550] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
    at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:267)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:75)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:90)
    at io.confluent.rest.Application.configureHandler(Application.java:217)
    at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:185)
    at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
    at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)
Caused by: io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete
    at io.confluent.kafka.schemaregistry.masterelector.kafka.KafkaGroupMasterElector.init(KafkaGroupMasterElector.java:204)
    at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:262)
    ... 6 more
Dverhun commented 3 years ago

@iguenkinrl Main problem for this issue

Problems with connectivity To Brokers, it could not read Schemas and all what is stored into _schema topic

That's the reason of fails

leshibily commented 3 years ago

is there a fix for this issue?

parisholley commented 3 years ago

i ran into this issue because even tho the cluster was accessible via localhost:port, the brokers registered themselves with their private network IP, so I updated my kafkastore.bootstrap.servers to not use localhost in schema-registry.properties

Boltmerz commented 3 years ago

@parisholley can you please provide the updated schema-registry.properties that you have because commenting out kafkastore.bootstrap.servers did not solve the issue for me.

ashokc commented 3 years ago

@Boltmerz Likewise. Moving to IP addresses from 'localhost' did NOT help.

Schema registry comes up fine the FIRST time, as I run

confluent local services start

But a subsequent 'stop' & 'start' errors out at Schema Registry'

ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:75) io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryException: Failed to get Kafka cluster ID

ashokc commented 3 years ago

The community edition (6.2) has similar issues as well. The first start goes thru fine. An immediate stop & start throws up errors like below. Not sure if it is trying to 'recreate' the '_schemas' topic

[2021-07-05 10:44:21,987] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:75) io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:312) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:73) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:88) at io.confluent.rest.Application.configureHandler(Application.java:256) at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:227) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43) Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: Failed trying to create or validate schema topic configuration at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:190) at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:121) at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:310) ... 6 more Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at io.confluent.kafka.schemaregistry.storage.KafkaStore.verifySchemaTopic(KafkaStore.java:248) at io.confluent.kafka.schemaregistry.storage.KafkaStore.createSchemaTopic(KafkaStore.java:233) at io.confluent.kafka.schemaregistry.storage.KafkaStore.createOrVerifySchemaTopic(KafkaStore.java:182) ... 8 more Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.

---- use a different topic name & restart ... starts up fine!

kafkastore.topic=_schemas

kafkastore.topic=_schemas2

--- Restart again and it fails... with the same error... now for '_schemas2'

[2021-07-05 10:57:49,432] INFO Validating schemas topic _schemas2 (io.confluent.kafka.schemaregistry.storage.KafkaStore:244) [2021-07-05 10:57:49,446] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:75) io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:312) ...

atrbgithub commented 3 years ago

@iguenkinrl

This fixed it for me with confluent platform 6.2:

SCHEMA_REGISTRY_HOST_NAME: localhost

Without that the schema registry would fail to start correctly and I would get a similar error to you. This seemed to work fine in previous versions, when moving to 6.2 I found the above must be set. Shelling into the pod would also reveal it was not listening on 8081. Eventually it would time out and stop with this error:

io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryTimeoutException: Timed out waiting for join group to complete at

My full environment block looks as follows for the schemas registry:

    environment:
      LOG_LEVEL: "INFO"
      LOG_DIR: /var/log/confluent
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://kafka:9092
      SCHEMA_REGISTRY_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/kafka/log4j.properties"
      SCHEMA_REGISTRY_KAFKASTORE_TOPIC_REPLICATION_FACTOR: 1
      SCHEMA_REGISTRY_HOST_NAME: localhost

All works fine then.

k0mmsussert0d commented 2 years ago

In my case switching from Java 11 to 1.8 resolved the issue. It's strange, since everything worked fine during first launch with OpenJDK 11.

patrickpang commented 2 years ago

Same problem here on Java 11. Fixed by restarting the shell.

Hamzablm commented 2 years ago

Switching to default java version to java 8 worked for me also.