bio-guoda / guoda-services

Services provided by GUODA, currently a container for tickets and wikis.
MIT License
2 stars 0 forks source link

marathon no longer running, causing api outage #52

Closed jhpoelen closed 6 years ago

jhpoelen commented 6 years ago

Marathon, the service that runs the api and spark job dispatcher, is no longer running for reasons unknown.

mjcollin commented 6 years ago

Seems to have lost consistency:

HTTP ERROR: 503
Problem accessing /. Reason:

    Could not determine the current leader
Powered by Jetty:// 9.3.z-SNAPSHOT

Restart on all nodes has brought it back up. Two possible causes: the last logs were requests from the UFIT security scanner (a classic destroyer of services) and Tuesday I filled up HDFS for about an hour though Marathon proper doesn't use HDFS for anything, only it's services do.

jhpoelen commented 6 years ago

Thanks for responding. Looks like marathon and the api are back up, see attached. screenshot from 2018-07-05 07-44-03 . Curious to learn more about the root causes.