Closed hokiegeek2 closed 7 years ago
@hokiegeek2 thanks for reporting. Endless loop of I'm not a leader
is caused becouse caused colocated instance is not a leader. You can disable this leadership check with
--marathon-leader
Marathon cluster-wide node name (defaults to<hostname>:8080
), the some leader specific calls will be made only if the specified node is the current Marathon-leader. Set to*
to always act like a Leader.
Is this working for you?
@janisz Cool, I will try that.
@janisz I tried setting marathon-leader=* and that immediately caused the I'm not a leader" loop.
Here's my configuration:
--sse-enabled=true
--web-enabled=false
--marathon-leader=<auto lookup vi localhost:8080/v2/leader upon service startup>
--marathon-location=<auto lookup of node upon service startup>
with default setting of sync-enabled=true
Are you using the latest version?
Yes, I grabbed the binary from github
The configuration you posted does not use *
is it correct? Are you sure you used this setting properly. You should see log debug message informing leader detection is disabled.
@hokiegeek2 Is the issue solved?
@janisz I did use one configuration where marathon-leader = *, but that's the one that immediately sent into the "I am not a leader". The configuration I posted above is the one I am using where marathon-consul goes into the "I am not a leader" loop eventually.
One thing I tried over the last day is to have just one marathon-consul running (had a cluster of four previously). With just one I am not seeing any "I am not a leader" looping issues. I've previously seen the recommendation for running just one marathon-consul instance on the marathon-consul README. Is it a problem to run 2..n marathon-consul instances? I am doing that for HA.
--John
@janisz My apologies, I actually had 1.3.1 I just grabbed 1.3.3 and will test today
No problem. Please confirm if it's working for you. Any ideas how to improve README will be welcome.
This version fixes the problem of failing to find the leader. However, in the process of troubleshooting this, I still see a series of marathon-consul restarts due to failure to connect to the co-located marathon. Strangely, at the same time marathon-consul logs that it cannot connect to the co-located marathon I can curl the marathon REST API just fine. I am not saying this is a problem with marathon-consul, but this is an issue in my deployment environment.
Connection error to co-located Marathon server instance results in marathon-consul restart after 5 connection errors with the following message:
This message is followed by:
This look happens three times and then, on the fourth iteration, marathon-consul gets into an endless loop of the following:
When I discovered this last round of error messages was ocurring, I was able to curl this URL from the host box, so it seems marathon-consul got in a state that the leader check failed irrespective of return of connectivity to the co-located marathon instance.
I recommend making a plugin framework that enables different options for handling failure of the co-located marathon instance. One possibility would be to enable passing a comma-delimited marathon urls analogous to a list of zookeepers.