canonical / kafka-operator

Kafka VM operator
Apache License 2.0
6 stars 12 forks source link

Debug log is being filled with java exceptions and other recurring messages #76

Closed sed-i closed 1 year ago

sed-i commented 1 year ago

I have the following deployment

Model    Controller    Cloud/Region         Version  SLA          Timestamp
default  machineworld  localhost/localhost  2.9.38   unsupported  16:22:24-05:00

SAAS  Status  Store                URL
prom  active  charm-dev-batteries  admin/welcome.prom

App    Version  Status  Scale  Charm          Channel  Rev  Exposed  Message
agent           active      1  grafana-agent             9  no       
kafka           active      1  kafka          edge       0  no       
zk              active      1  zookeeper      edge      90  no       

Unit        Workload  Agent  Machine  Public address  Ports  Message
kafka/2*    active    idle   4        10.30.254.40           
  agent/7*  active    idle            10.30.254.40           
zk/0*       active    idle   2        10.30.254.66           

Machine  State    Address       Inst id        Series  AZ  Message
2        started  10.30.254.66  juju-0728c2-2  jammy       Running
4        started  10.30.254.40  juju-0728c2-4  jammy       Running

Relation provider          Requirer                 Interface                Type         Message
kafka:cluster              kafka:cluster            cluster                  peer         
kafka:juju-info            agent:juju-info          juju-info                subordinate  
kafka:restart              kafka:restart            rolling_op               peer         
prom:receive-remote-write  agent:send-remote-write  prometheus_remote_write  regular      
zk:cluster                 zk:cluster               cluster                  peer         
zk:restart                 zk:restart               rolling_op               peer         
zk:zookeeper               kafka:zookeeper          zookeeper                regular      

There are currently 3 types of debug messages that fill the debug-log. They take up multiple screens but I can't tell what to do about them.

Is this normal lifecycle? Should the long messages be absorbed in charm code by shorter and more telling messages?

Failed command

unit-kafka-2: 16:15:15.148 DEBUG unit.kafka/2.juju-log Command 'KAFKA_OPTS=-Djava.security.auth.login.config=/var/snap/charmed-kafka/common/zookeeper-jaas.cfg charmed-kafka.configs --zookeeper=10.30.254.66:2181/kafka --alter --entity-type=users --entity-name=sync --add-config=SCRAM-SHA-512=[password=AbZal5OW65hIdcda9tIxMzKvld4H1qMl] --zk-tls-config-file=/var/snap/charmed-kafka/common/server.properties' returned non-zero exit status 1.

Server properties

Could this be replaced by an action? E.g. juju run-action kafka/2 get-server-properties --wait

unit-kafka-2: 16:16:19.698 DEBUG unit.kafka/2.juju-log server properties: ['super.users=User:sync;User:admin', 'log.dirs=/var/snap/charmed-kafka/common/log-data/2', 'listener.security.protocol.map=INTERNAL_SASL_PLAINTEXT:SASL_PLAINTEXT', 'listeners=INTERNAL_SASL_PLAINTEXT://:19092', 'advertised.listeners=INTERNAL_SASL_PLAINTEXT://10.30.254.40:19092', 'inter.broker.listener.name=INTERNAL_SASL_PLAINTEXT', 'compression.type=producer', 'log.flush.interval.messages=9223372036854775807', 'log.flush.interval.ms=9223372036854775807', 'log.flush.offset.checkpoint.interval.ms=60000', 'log.retention.bytes=-1', 'log.retention.ms=-1', 'log.segment.bytes=1073741824', 'message.max.bytes=1048588', 'offsets.topic.num.partitions=50', 'transaction.state.log.num.partitions=50', 'unclean.leader.election.enable=False', 'log.cleaner.delete.retention.ms=86400000', 'log.cleaner.min.compaction.lag.ms=0', 'log.cleanup.policy=delete', 'log.message.timestamp.type=CreateTime', 'replication.quota.window.num=11', 'listener.name.internal_sasl_plaintext.scram-sha-512.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="sync" password="AbZal5OW65hIdcda9tIxMzKvld4H1qMl";', 'default.replication.factor=1', 'num.partitions=1', 'transaction.state.log.replication.factor=1', 'offsets.topic.replication.factor=1', 'min.insync.replicas=1', 'transaction.state.log.min.isr=1', 'broker.id=2', 'zookeeper.connect=10.30.254.66:2181/kafka', '', 'sasl.enabled.mechanisms=SCRAM-SHA-512', 'sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512', 'authorizer.class.name=kafka.security.authorizer.AclAuthorizer', 'allow.everyone.if.no.acl.found=false', '']

Java exceptions

unit-zk-0: 16:13:49.827 DEBUG unit.zk/0.juju-log Operator Framework 2.0.0 up and running.
unit-zk-0: 16:13:49.886 DEBUG unit.zk/0.juju-log Invalid Prometheus alert rules folder at /var/lib/juju/agents/unit-zk-0/charm/src/prometheus_alert_rules: directory does not exist
unit-zk-0: 16:13:49.892 DEBUG unit.zk/0.juju-log Emitting Juju event update_status.
unit-zk-0: 16:13:49.896 WARNING unit.zk/0.juju-log No relation: certificates
unit-kafka-2: 16:13:05.018 DEBUG unit.kafka/2.juju-log cmd failed - cmd=KAFKA_OPTS=-Djava.security.auth.login.config=/var/snap/charmed-kafka/common/zookeeper-jaas.cfg charmed-kafka.configs --zookeeper=10.30.254.66:2181/kafka --alter --entity-type=users --entity-name=sync --add-config=SCRAM-SHA-512=[password=AbZal5OW65hIdcda9tIxMzKvld4H1qMl] --zk-tls-config-file=/var/snap/charmed-kafka/common/server.properties, stdout=Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.
Use --bootstrap-server instead to specify a broker to connect to.
, stderr=[2023-03-07 21:12:33,729] WARN Session 0x0 for sever juju-0728c2-2.lxd/10.30.254.66:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.Net.pollConnect(Native Method)
    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290)
[2023-03-07 21:12:34,833] WARN Session 0x0 for sever juju-0728c2-2.lxd/10.30.254.66:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.Net.pollConnect(Native Method)
    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290)
[2023-03-07 21:12:35,935] WARN Session 0x0 for sever juju-0728c2-2.lxd/10.30.254.66:2181, Closing socket connection. 

# AND ON AND ON IT GOES
marcoppenheimer commented 1 year ago

Barring the server properties one (which can probably be removed entirely), the others are logging out failed attempts to make a client connection to ZooKeeper (which is normal, ZooKeeper is slow). We debug out the errors to keep track of the attempts made, and then defer. You're right it does rack up a fairly hefty DEBUG log. We're using INFO level logs for valuable messages for the user (and also for charm dev), would that work for you?

sed-i commented 1 year ago

Yeah, DEBUG is certainly on the chattier side by design.

So I have zk related to kafka (for several days now), and KafkaAuth.add_user keeps failing (per log above), and hence the debug log is filled every 30 sec.

$ juju debug-log --replay | grep "unit.kafka/2.juju-log cmd failed" | tail -5 
unit-kafka-2: 12:11:29.430 DEBUG unit.kafka/2.juju-log cmd failed - cmd=KAFKA_OPTS=-Djava.security.auth.login.config=/var/snap/charmed-kafka/common/zookeeper-jaas.cfg charmed-kafka.configs --zookeeper=10.30.254.66:2181/kafka --alter --entity-type=users --entity-name=sync --add-config=SCRAM-SHA-512=[password=AbZal5OW65hIdcda9tIxMzKvld4H1qMl] --zk-tls-config-file=/var/snap/charmed-kafka/common/server.properties, stdout=Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.
unit-kafka-2: 12:59:01.719 DEBUG unit.kafka/2.juju-log cmd failed - cmd=KAFKA_OPTS=-Djava.security.auth.login.config=/var/snap/charmed-kafka/common/zookeeper-jaas.cfg charmed-kafka.configs --zookeeper=10.30.254.66:2181/kafka --alter --entity-type=users --entity-name=sync --add-config=SCRAM-SHA-512=[password=AbZal5OW65hIdcda9tIxMzKvld4H1qMl] --zk-tls-config-file=/var/snap/charmed-kafka/common/server.properties, stdout=Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.
unit-kafka-2: 12:59:34.206 DEBUG unit.kafka/2.juju-log cmd failed - cmd=KAFKA_OPTS=-Djava.security.auth.login.config=/var/snap/charmed-kafka/common/zookeeper-jaas.cfg charmed-kafka.configs --zookeeper=10.30.254.66:2181/kafka --alter --entity-type=users --entity-name=sync --add-config=SCRAM-SHA-512=[password=AbZal5OW65hIdcda9tIxMzKvld4H1qMl] --zk-tls-config-file=/var/snap/charmed-kafka/common/server.properties, stdout=Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.
unit-kafka-2: 13:00:06.491 DEBUG unit.kafka/2.juju-log cmd failed - cmd=KAFKA_OPTS=-Djava.security.auth.login.config=/var/snap/charmed-kafka/common/zookeeper-jaas.cfg charmed-kafka.configs --zookeeper=10.30.254.66:2181/kafka --alter --entity-type=users --entity-name=sync --add-config=SCRAM-SHA-512=[password=AbZal5OW65hIdcda9tIxMzKvld4H1qMl] --zk-tls-config-file=/var/snap/charmed-kafka/common/server.properties, stdout=Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.
unit-kafka-2: 13:00:38.797 DEBUG unit.kafka/2.juju-log cmd failed - cmd=KAFKA_OPTS=-Djava.security.auth.login.config=/var/snap/charmed-kafka/common/zookeeper-jaas.cfg charmed-kafka.configs --zookeeper=10.30.254.66:2181/kafka --alter --entity-type=users --entity-name=sync --add-config=SCRAM-SHA-512=[password=AbZal5OW65hIdcda9tIxMzKvld4H1qMl] --zk-tls-config-file=/var/snap/charmed-kafka/common/server.properties, stdout=Warning: --zookeeper is deprecated and will be removed in a future version of Kafka.

I realize this may be because my deployment is incomplete, and I was wondering:

marcoppenheimer commented 1 year ago

That does seem strange. It appears that Kafka + ZK is down, and Kafka keeps failing the on_start hook. I think we've had this reported before, will try to address soon!

In the meantime, a juju remove-application and juju deploy should fix it.