confluentinc / cp-ansible

Ansible playbooks for the Confluent Platform
Apache License 2.0
37 stars 406 forks source link

Kafka expects to store log data in a persistent location #84

Closed jvalderrama closed 5 years ago

jvalderrama commented 5 years ago

I have a complete installation, and it is working quite well but I have a ERROR message in the control-center log that make me crazy:

ERROR [control-center-heartbeat-1] broker=1 is storing logs in /tmp/kafka-logs, Kafka expects to store log data in a persistent location (io.confluent.controlcenter.healthcheck.HealthCheck)

How can I can fix this issue ?

My current configuration in /etc/kafka/server.properties is:

# Maintained by Ansible
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://10.251.64.5:9092

zookeeper.connect=10.251.64.8:2181,10.251.64.7:2181,10.251.64.5:2181
--override log.dir=/var/lib/kafka/data
broker.id=1

log.segment.bytes=1073741824
socket.receive.buffer.bytes=102400
socket.send.buffer.bytes=102400
confluent.metrics.reporter.topic.replicas=3
num.network.threads=8
ssl.endpoint.identification.algorithm=
num.io.threads=16
confluent.metrics.reporter.ssl.endpoint.identification.algorithm=
transaction.state.log.min.isr=2
zookeeper.connection.timeout.ms=6000
offsets.topic.replication.factor=3
socket.request.max.bytes=104857600
log.retention.check.interval.ms=300000
group.initial.rebalance.delay.ms=0
metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter
auto.create.topics.enable=False
num.recovery.threads.per.data.dir=2
transaction.state.log.replication.factor=3
confluent.metrics.reporter.bootstrap.servers=10.251.64.5:9092
log.retention.hours=168
num.partitions=1

# Confluent Support
confluent.support.metrics.enable=true
confluent.support.customer.id=anonymous
OneCricketeer commented 5 years ago

This line of your config looks a little off..

--override log.dir=/var/lib/kafka/data

Perhaps you meant log.dirs anyway? Which is loaded by the broker.datadir list config

OneCricketeer commented 5 years ago

Also, note, the error is from Control Center. You should show it's configuration, not the Kafka broker

jvalderrama commented 5 years ago

The line of config (I was missing a 's', sorry) has been set with --override due the next error:

FATAL [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.KafkaException: Found directory /var/lib/kafka/data, 'data' is not in the form of topic-partition or topic-partition.uniqueId-delete (if marked for deletion).
Kafka's log directories (and children) should only contain Kafka topic data.

The above due the cluster has an attached data disk for that location and it does not work without the --override prefix

Therefore to solve the issue it is neccessary to set

--override log.dirs=/var/lib/kafka/data

Please refer here: https://stackoverflow.com/questions/49854532/kafkas-log-directories-should-only-contain-kafka-topic-data

But it seems that control-center healt does not takes it correctly, and gives the error

ERROR [control-center-heartbeat-1] broker=1 is storing logs in /tmp/kafka-logs, Kafka expects to store log data in a persistent location (io.confluent.controlcenter.healthcheck.HealthCheck)

I have been checking the control -center configuration but I can not see what is happening yet.

OneCricketeer commented 5 years ago

AFAICT, the linked question is for Docker container commands and --override is only useful for kafka-server-start.

I think the accepted question meant to exclude the --override and only say put the key-value pair into the properties.

The fact of the matter is that you have an invalid property there, so it's defaulting to /tmp/kafka-logs

The other error you mentioned would only be a problem if there's content within the directory that is not of the format listed in the error.

jvalderrama commented 5 years ago

I have been able to solve the issue, the attached disk in /var/lib/kafka/data has a folder lost+found because fsck therefore kafka says the volume is not in the correct format, finally I let the configuration file like:

log.dirs=/var/lib/kafka/data

And I removed the lost+found directory in /var/lib/kafka/data attached data disk and the errror has gone and the cluster is working fine.

@cricket007 thanks for your help. Hope it helps for further related questions.

OneCricketeer commented 5 years ago

Welcome. Feel free to close the issue

OneCricketeer commented 5 years ago

And for reference, how did you manage to override the template?

https://github.com/confluentinc/cp-ansible/blob/5.1.x/roles/confluent.kafka-broker/templates/includes/base_server.properties.j2#L3

You're not supposed to be manually editing these files

jvalderrama commented 5 years ago

Due I attached a disk for /var/lib/kafka/data the confluent-kafka.service failed, therefore manually I changed the /etc/kafka/server.properties to override the parameter. Just that!!

OneCricketeer commented 5 years ago

Even if you attached a disk, though, you should be able to change the kafka.broker property, then re-run the playbook, and it'll update the properties for you.

One potential take-away here is that Ansible could be smarter and do that pre-check first to make sure the data directory is empty.

jvalderrama commented 5 years ago

Ok. Thanks for you advice. I'll take it to further references. Many thanks for answers, these were very useful to solve the problem.