canonical / cassandra-k8s-operator

Apache License 2.0
1 stars 4 forks source link

Cassandra fails to restart when adding the monitoring relation #37

Closed mmanciop closed 3 years ago

mmanciop commented 3 years ago

In some cases, when adding the monitoring relation, Cassandra fails to restart with these errors in the pod logs:

2021-08-17T15:05:35.745Z [cassandra] Caused by: picocli.CommandLine$ExecutionException: Error while calling command (com.zegelin.cassandra.exporter.Agent@2f9f7dcf): java.net.BindException: Address already in use
2021-08-17T15:05:35.745Z [cassandra]     at picocli.CommandLine.execute(CommandLine.java:1016)
2021-08-17T15:05:35.745Z [cassandra]     at picocli.CommandLine.access$900(CommandLine.java:142)
2021-08-17T15:05:35.745Z [cassandra]     at picocli.CommandLine$RunLast.handle(CommandLine.java:1199)
2021-08-17T15:05:35.745Z [cassandra]     at picocli.CommandLine$RunLast.handle(CommandLine.java:1167)
2021-08-17T15:05:35.745Z [cassandra]     at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1075)
2021-08-17T15:05:35.745Z [cassandra]     at picocli.CommandLine.parseWithHandlers(CommandLine.java:1358)
2021-08-17T15:05:35.745Z [cassandra]     at com.zegelin.cassandra.exporter.Agent.premain(Agent.java:52)
2021-08-17T15:05:35.745Z [cassandra]     ... 6 more
2021-08-17T15:05:35.745Z [cassandra] Caused by: java.net.BindException: Address already in use
2021-08-17T15:05:35.745Z [cassandra]     at sun.nio.ch.Net.bind0(Native Method)
2021-08-17T15:05:35.745Z [cassandra]     at sun.nio.ch.Net.bind(Net.java:461)
2021-08-17T15:05:35.745Z [cassandra]     at sun.nio.ch.Net.bind(Net.java:453)
2021-08-17T15:05:35.745Z [cassandra]     at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
2021-08-17T15:05:35.745Z [cassandra]     at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)
2021-08-17T15:05:35.745Z [cassandra]     at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501)
2021-08-17T15:05:35.745Z [cassandra]     at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:496)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:481)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446)
2021-08-17T15:05:35.746Z [cassandra]     at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
2021-08-17T15:05:35.746Z [cassandra]     at java.lang.Thread.run(Thread.java:748)

The Pod remains then a "zombie": active for Juju, but no actual productive workload running inside it.

mmanciop commented 3 years ago

Actually, it seems from my tests that the units are all in zombie state even if they get created as the monitoring relation is already established.

dstathis commented 3 years ago

This should be fixed in the latest edge release. Are you ever still running in to this @mmanciop?

ps: The fact that it still reports active in this case is a pebble limitation. The pebble team is working on it.

mmanciop commented 3 years ago

I have not tried in a while, but it should not be hard for you to replicate with a while true; do juju {add|remove}-relation cassandra prometheus; sleep 10s; done In Bash or so

dstathis commented 3 years ago

It is fixed in my rewrite branch. I'll close this issue when it merges.