confluentinc / schema-registry

Confluent Schema Registry for Kafka
https://docs.confluent.io/current/schema-registry/docs/index.html
Other
2.2k stars 1.11k forks source link

Error upon restart of schema registry using Kafka SASL #717

Closed briandoran closed 5 years ago

briandoran commented 6 years ago

Schema registry 4.0.0 with Kafka 1.0.0 .. when I start it up first time it works fine without problem but when I do a docker restart .. or redeploy a docker container I get the following error and the service does not start. Any pointers would be greatly appreciated!

Thanks Brian

[2018-01-18 21:35:13,332] DEBUG Set SASL client state to CLIENT_COMPLETE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) [2018-01-18 21:35:13,334] DEBUG Set SASL client state to COMPLETE (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator) [2018-01-18 21:35:13,335] DEBUG [AdminClient clientId=adminclient-1] Initiating API versions fetch from node 3. (org.apache.kafka.clients.NetworkClient) [2018-01-18 21:35:13,337] DEBUG [AdminClient clientId=adminclient-1] Recorded API versions for node 3: (Produce(0): 0 to 5 [usable: 5], Fetch(1): 0 to 6 [usable: 6], ListOffsets(2): 0 to 2 [usable: 2], Metadata(3): 0 to 5 [usable: 5], LeaderAndIsr(4): 0 to 1 [usable: 1], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 4 [usable: 4], ControlledShutdown(7): 0 to 1 [usable: 1], OffsetCommit(8): 0 to 3 [usable: 3], OffsetFetch(9): 0 to 3 [usable: 3], FindCoordinator(10): 0 to 1 [usable: 1], JoinGroup(11): 0 to 2 [usable: 2], Heartbeat(12): 0 to 1 [usable: 1], LeaveGroup(13): 0 to 1 [usable: 1], SyncGroup(14): 0 to 1 [usable: 1], DescribeGroups(15): 0 to 1 [usable: 1], ListGroups(16): 0 to 1 [usable: 1], SaslHandshake(17): 0 to 1 [usable: 1], ApiVersions(18): 0 to 1 [usable: 1], CreateTopics(19): 0 to 2 [usable: 2], DeleteTopics(20): 0 to 1 [usable: 1], DeleteRecords(21): 0 [usable: 0], InitProducerId(22): 0 [usable: 0], OffsetForLeaderEpoch(23): 0 [usable: 0], AddPartitionsToTxn(24): 0 [usable: 0], AddOffsetsToTxn(25): 0 [usable: 0], EndTxn(26): 0 [usable: 0], WriteTxnMarkers(27): 0 [usable: 0], TxnOffsetCommit(28): 0 [usable: 0], DescribeAcls(29): 0 [usable: 0], CreateAcls(30): 0 [usable: 0], DeleteAcls(31): 0 [usable: 0], DescribeConfigs(32): 0 [usable: 0], AlterConfigs(33): 0 [usable: 0], AlterReplicaLogDirs(34): 0 [usable: 0], DescribeLogDirs(35): 0 [usable: 0], SaslAuthenticate(36): 0 [usable: 0], CreatePartitions(37): 0 [usable: 0]) (org.apache.kafka.clients.NetworkClient) [2018-01-18 21:35:13,347] INFO Validating schemas topic _schemas (io.confluent.kafka.schemaregistry.storage.KafkaStore) [2018-01-18 21:35:13,349] DEBUG [AdminClient clientId=adminclient-1] Queueing Call(callName=describeTopics, deadlineMs=1516311433348) with a timeout 120000 ms from now. (org.apache.kafka.clients.admin.KafkaAdminClient) [2018-01-18 21:35:13,357] DEBUG [AdminClient clientId=adminclient-1] Queueing Call(callName=describeConfigs, deadlineMs=1516311433356) with a timeout 120000 ms from now. (org.apache.kafka.clients.admin.KafkaAdminClient) [2018-01-18 21:35:13,363] DEBUG [AdminClient clientId=adminclient-1] Initiating close operation. (org.apache.kafka.clients.admin.KafkaAdminClient) [2018-01-18 21:35:13,363] DEBUG [AdminClient clientId=adminclient-1] Waiting for the I/O thread to exit. Hard shutdown in 31536000000 ms. (org.apache.kafka.clients.admin.KafkaAdminClient) [2018-01-18 21:35:13,366] DEBUG Removed sensor with name connections-closed: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,366] DEBUG Removed sensor with name connections-created: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,366] DEBUG Removed sensor with name successful-authentication: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,366] DEBUG Removed sensor with name failed-authentication: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,367] DEBUG Removed sensor with name bytes-sent-received: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,369] DEBUG Removed sensor with name bytes-sent: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,369] DEBUG Removed sensor with name bytes-received: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,370] DEBUG Removed sensor with name select-time: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,370] DEBUG Removed sensor with name io-time: (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,371] DEBUG Removed sensor with name node--1.bytes-sent (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,371] DEBUG Removed sensor with name node--1.bytes-received (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,371] DEBUG Removed sensor with name node--1.latency (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,376] DEBUG Removed sensor with name node--3.bytes-sent (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,376] DEBUG Removed sensor with name node--3.bytes-received (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,376] DEBUG Removed sensor with name node--3.latency (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,377] DEBUG Removed sensor with name node--2.bytes-sent (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,377] DEBUG Removed sensor with name node--2.bytes-received (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,377] DEBUG Removed sensor with name node--2.latency (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,379] DEBUG Removed sensor with name node-3.bytes-sent (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,379] DEBUG Removed sensor with name node-3.bytes-received (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,379] DEBUG Removed sensor with name node-3.latency (org.apache.kafka.common.metrics.Metrics) [2018-01-18 21:35:13,380] WARN [Principal=kafka_client@wakka.wakka.COM]: TGT renewal thread has been interrupted and will exit. (org.apache.kafka.common.security.kerberos.KerberosLogin) [2018-01-18 21:35:13,380] DEBUG [AdminClient clientId=adminclient-1] Exiting AdminClientRunnable thread. (org.apache.kafka.clients.admin.KafkaAdminClient) [2018-01-18 21:35:13,381] DEBUG [AdminClient clientId=adminclient-1] Kafka admin client closed. (org.apache.kafka.clients.admin.KafkaAdminClient) [2018-01-18 21:35:13,381] INFO Shutting down schema registry (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry) [2018-01-18 21:35:13,384] ERROR Server died unexpectedly: (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain)

goldv commented 6 years ago

I'm seeing the same error. When I delete the _schemas topic, the registry starts ok, however subsequent re-starts consistently fail.

[2018-01-24 16:13:32,726] INFO Logging initialized @906ms (org.eclipse.jetty.util.log:186) [2018-01-24 16:13:33,625] INFO Initializing KafkaStore with broker endpoints: PLAINTEXT://xstm6518pap.stm.swissbank.com:9092 (io.confluent.kafka.schemaregistry.storage.KafkaStore:103) [2018-01-24 16:13:34,529] INFO Validating schemas topic _schemas (io.confluent.kafka.schemaregistry.storage.KafkaStore:228) [2018-01-24 16:13:34,614] WARN The replication factor of the schema topic _schemas is less than the desired one of 3. If this is a production environment, it's crucial to add more brokers and increase the replication factor of the topic. (io.confluent.kafka.schemaregistry.storage.KafkaStore:242) [2018-01-24 16:13:34,628] INFO Shutting down schema registry (io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry:719) [2018-01-24 16:13:34,631] ERROR Server died unexpectedly: (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain:51) java.lang.NullPointerException at io.confluent.kafka.schemaregistry.storage.KafkaStore.close(KafkaStore.java:366) at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.close(KafkaSchemaRegistry.java:720) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.onShutdown(SchemaRegistryRestApplication.java:111) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:66) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:42) at io.confluent.rest.Application.createServer(Application.java:157) at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:43)

briandoran commented 6 years ago

Could someone in confluent provide any feedback. Is there something specific that needs to be done when restarting a schema-registry docker container when connecting to kafka using kerberos?

In order to get the system started again I need to reinstall my kerberos and kafka ..

_[2018-01-18 21:35:13,380] WARN [Principal=kafkaclient@wakka.wakka.COM]: TGT renewal thread has been interrupted and will exit. (org.apache.kafka.common.security.kerberos.KerberosLogin)

icearn commented 6 years ago

@briandoran @goldv Did you guys make any progress on this? I experienced the same issue with the same version of schema-registry 4.0.0 and kafka 1.0.0. I've tested that schema-registry works fine when connecting to plaintext kafka, just when it moves to SASL_SSL this issue pop up.

icearn commented 6 years ago

hi guys, I figured out the answer for this problem. It is not kerberos issue, but the ACL issue. First I would like to recommend this doc https://www.confluent.io/blog/securing-confluent-schema-registry-apache-kafka/

I tried to add "zookeeper.set.acl=true" to both the broker and the schema-registry client but nothing changed, SASL connection still broke. So after various approach I found it's the acl control breaks the producer connection - rather than the adminclient connection.

Here is how you can fix it -

  1. configure schema-registry properties file: kafkastore.group.id=

  2. add all these acl operations:

kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User: --topic _schemas --producer

kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User: --topic _schemas --consumer --group

kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User: --topic __consumer_offsets --operation Describe

kafka-acls --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User: --topic _schemas --operation DescribeConfigs

bobby259 commented 6 years ago

@icearn The solution you provided was spot on. It worked like a charm for me.

chronidev commented 6 years ago

The solution you provide permits to startup the schema registry, but there are still issues, if you check log on broker you should see deny logs for DESCRIBE on all topics.

rayokota commented 5 years ago

The NPE was fixed in 4.1.x