Netflix / mantis

A platform that makes it easy for developers to build realtime, cost-effective, operations-focused applications
Apache License 2.0
1.42k stars 202 forks source link

Crashing containers following official docker-compose getting started guide #605

Open jpittis opened 11 months ago

jpittis commented 11 months ago

Summary

Hey folks. I'm encountering crashing containers while following the official docker-compose guide getting started guide:

$ mkdir mantis
$ cd mantis
$ wget https://raw.githubusercontent.com/Netflix/mantis/master/docker-compose.yml
$ docker-compose -f docker-compose.yml up

I can see logs showing that mantisagent, mantismaster, and mantisapi all exit with non-zero status codes (in that order):

mantis-compose-mantisagent-1 exited with code 2
mantis-compose-mantismaster-1 exited with code 1
mantis-compose-mantisapi-1 exited with code 1

I'm running this on OSX Sonoma (aarch64), but have been able to reproduce on Linux (amd64) as well.

Details

Right before the crash of the agent and the master, I see the following log line from ZooKeeper:

mantis-compose-zookeeper-1         | 2023-12-13 01:40:04,582 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@675] - Received packet at server of unknown type 19

Other potential culprit logs (there are too many to dump all of them here) include:

mantis-compose-mantisagent-1       | 2023-12-13 01:40:03 ERROR AgentV2Main:114 - Unexpected error: java.lang.IllegalStateException: Expected the service TaskExecutorStarter [FAILED] to be RUNNING, but the service has FAILED
mantis-compose-mantisagent-1       | Caused by: io.mantisrx.shaded.org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCode = Unimplemented for /mantis/master
mantis-compose-mantismaster-1      | 2023-12-13 01:40:04 ERROR MasterMain:269 - caught exception on Mantis Master initialization
mantis-compose-mantismaster-1      | java.lang.RuntimeException: java.lang.IllegalStateException: Expected the service ZookeeperMasterMonitor [FAILED] to be RUNNING, but the service has FAILED

mantis-compose-mantismaster-1      | Caused by: io.mantisrx.shaded.org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCode = Unimplemented for /mantis/master
jpittis commented 11 months ago

After encountering this thread, I decided to try bumping the ZooKeeper version to something more recent (specifically "jplock/zookeeper:3.5.5"), and that seems to have resolved the issue. I'll be following up with a PR to bump the docker-compose file, at which point I believe we can close this issue.

calvin681 commented 11 months ago

Hello, we have moved to using Helm for running containers. You can follow the README in this repo:

https://github.com/Netflix/mantis-helm

It should start all the necessary containers in Kubernetes.

jpittis commented 11 months ago

Sweet, thanks for sharing the preferred approach @calvin681.

Does that imply we don't want the docker-compose file fixed?

In that case, should we delete it and update the getting started guide to point to mantis-helm instead?

jpittis commented 11 months ago

@calvin681: Separately, is there a chance that my issues building from source (https://github.com/Netflix/mantis/issues/604) are related to the fact that developers build using some special process/env that's also out of date with the official docs?

calvin681 commented 11 months ago

Yes, we should delete docker compose and update the docs.

We still use gradle to build, see this line in our github action: https://github.com/Netflix/mantis/blob/master/.github/workflows/nebula-ci.yml#L39

If that's failing, then it's likely a bug.

jpittis commented 11 months ago

How about we keep this ticket open until we fix up the docs. I should have an opportunity to go through the helm process later this week, and it shouldn't be that much extra work to fix up the docs at the same time.