Open ThatRendle opened 8 years ago
Addendum: I Googled Failed to write Noop record to kafka store
and found some issues on the schema-registry
repo which mentioned it, but none of the workarounds there made any difference in this case.
@markrendle This looks like a limitation in docker-compose. See https://docs.docker.com/compose/faq/#how-do-i-get-compose-to-wait-for-my-database-to-be-ready-before-starting-my-application What's happening (based on how I saw it run locally) is that the schema-registry is being started up before other services (as well as other services starting out of order, e.g. the REST proxy image was pulled and the container started first).
Unfortunately, docker-compose seems to assume that it exists in a vacuum and that there wouldn't be something else in production (e.g. init daemons, YARN/Mesos/similar) that would manage the healthchecks, restarting failed processes, etc., and in turn assumes that you don't ever want to use a process exiting with a non-zero exit code as a useful indication of errors... The version with links works because it correctly defines the dependencies and for historical reasons docker-compose has to respect them.
To make this work reliably in docker-compose, I think we'd need to run everything in the docker-images under an init daemon that is configured to always restart processes that die to get the behavior that docker-compose seems to assume.
That said, even extending the timeouts by setting
SCHEMA_REGISTRY_KAFKASTORE_INIT_TIMEOUT_MS: 300000
SCHEMA_REGISTRY_KAFKASTORE_TIMEOUT_MS: 300000
under the schema-registry
entry in the docker-compose.yml
does not fix the problem. It looks like the Kafka broker startup might be getting interrupted somehow with a shutdown request (although the controller resignation is a bit confusing as there is only a single Kafka broker):
kafka | [2015-12-22 05:23:56,009] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@7e768ee6] (org.I0Itec.zkclient.ZkEventThread)
kafka | java.lang.IllegalStateException: Kafka scheduler has not been started
kafka | at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
kafka | at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
kafka | at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
kafka | at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
kafka | at kafka.utils.Utils$.inLock(Utils.scala:535)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134)
kafka | at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
kafka | at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
kafka | [2015-12-22 05:23:56,063] ERROR [KafkaApi-0] error when handling request Name: TopicMetadataRequest; Version: 0; CorrelationId: 48; ClientId: producer-1; Topics: _schemas (kafka.server.KafkaApis)
kafka | kafka.admin.AdminOperationException: replication factor: 1 larger than available brokers: 0
kafka | at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:70)
kafka | at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:171)
kafka | at kafka.server.KafkaApis$$anonfun$19.apply(KafkaApis.scala:520)
kafka | at kafka.server.KafkaApis$$anonfun$19.apply(KafkaApis.scala:503)
kafka | at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
kafka | at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
kafka | at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
kafka | at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
kafka | at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
kafka | at scala.collection.SetLike$class.map(SetLike.scala:93)
kafka | at scala.collection.AbstractSet.map(Set.scala:47)
kafka | at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:503)
kafka | at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:542)
kafka | at kafka.server.KafkaApis.handle(KafkaApis.scala:62)
kafka | at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
kafka | at java.lang.Thread.run(Thread.java:745)
I don't have any concrete next steps to suggest yet aside from adjusting the different services to wait long enough for dependent services to start up (which I don't know will work reliably if they have to pull the images too...), but I figured I'd at least document some initial findings.
I'm trying to create a
docker-compose
file to bring up a dev stack on docker 1.9.1, which has the new network support in favour of which--link
has been deprecated.Ignoring the deprecation and using this
docker-compose.yml
withlinks
works fine:But when I try to use a docker 1.9 network bridge (called
confluent
) with thisdocker-compose.yml
, the Schema Registry won't start:The logs from the registry container look like this:
And while registry is trying to start, the
kafka
container logs this a bunch of times:(I checked, and
172.19.0.2
was the IP address of theregistry
container.)And then after it fails, the
zookeeper
container logs this:I'm OK with using
links
for now, but it would be nice to get this resolved.