CiscoCloud / marathon-consul

bridge Marathon information to Consul KV
Apache License 2.0
85 stars 18 forks source link

Connecting to localhost rather than configured registry address #12

Closed roobert closed 9 years ago

roobert commented 9 years ago

Hi,

I'm running marathon-consul as a docker instance deployed by puppet, here is the process:

CONTAINER ID                                                       IMAGE                        COMMAND                                                                                           CREATED             STATUS              PORTS                                                                                                                                                        NAMES
a9adf6b884bd3475211305f049082cd907447fdee890d36467b5609369541cb7   ciscocloud/marathon-consul   "/launch.sh --marathon-location=172.17.42.1:8080 --registry=172.17.42.1:8500 --log-level=debug"   23 minutes ago      Up 23 minutes       0.0.0.0:4000->4000/tcp                                                                                                                                       marathon-consul

As you can see, I've specified the registry and marathon locations to be the docker bridge IP.

Then I start marathon with the callback flag and create a callback:

curl -X POST 'http://localhost:8080/v2/eventSubscriptions?callbackUrl=http://localhost:4000/events' -v

I've also tried creating a callback with docker bridge IP:

curl -X POST 'http://172.17.42.1:8080/v2/eventSubscriptions?callbackUrl=http://172.17.42.1:4000/events' -v

In both instances when running marathon-consul in debug mode, I see the following:

time="2015-10-18T17:28:22Z" level=info msg="handling event" eventType="status_update_event" 
time="2015-10-18T17:28:22Z" level=debug msg="{\"slaveId\":\"20151017-195725-3111233728-5050-17500-S0\",\"taskId\":\"basic-0.838db8f8-75bd-11e5-b38e-5e84bfba7cb5\",\"taskStatus\":\"TASK_KILLED\",\"message\":\"Command terminated with signal Terminated\",\"appId\":\"/basic-0\",\"host\":\"mesos0\",\"ports\":[20777],\"version\":\"2015-10-18T17:27:27.950Z\",\"eventType\":\"status_update_event\",\"timestamp\":\"2015-10-18T17:28:22.309Z\"}" 
time="2015-10-18T17:29:18Z" level=info msg="handling event" eventType="api_post_event" 
time="2015-10-18T17:29:18Z" level=info msg="[ERROR] response generated error: Get http://127.0.0.1:8500/v1/kv/marathon/basic-0: dial tcp 127.0.0.1:8500: connection refused" 
time="2015-10-18T17:29:18Z" level=debug msg="{\"clientIp\":\"10.2.9.151\",\"uri\":\"/v2/apps//basic-0\",\"appDefinition\":{\"id\":\"/basic-0\",\"cmd\":null,\"args\":null,\"user\":null,\"env\":{},\"instances\":1,\"cpus\":1.0,\"mem\":128.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[],\"uris\":[],\"storeUrls\":[],\"ports\":[0],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":null,\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"acceptedResourceRoles\":null,\"version\":\"2015-10-18T17:29:18.558Z\"},\"eventType\":\"api_post_event\",\"timestamp\":\"2015-10-18T17:29:18.560Z\"}" 
time="2015-10-18T17:29:18Z" level=info msg="not handling event" eventType="group_change_success" 

I can't seem to work out why the query is going to '127.0.0.1:8500' rather than the configured registry address. Any help would be appreciated!

Cheers,

root@mesos0:~# dpkg -l | grep -E 'marathon|mesos'
ii  marathon                            0.9.0-1.0.381.debian77            amd64        Cluster-wide init and control system for services running on Apache Mesos
ii  mesos                               0.22.1-1.0.debian78               amd64        Cluster resource manager with efficient resource isolation
roobert commented 9 years ago

Doesn't seem to have been activity in this repo since July, @BrianHicks, is this project still active?

Thanks,

stevendborrelli commented 9 years ago

@roobert this project is still active, but it is Sunday. We've been using this project on several clusters.

We're also looking at using http://traefik.github.io instead of marathon -> haproxy-consul. If traefik works out in our clusters, we'll probably reduce support for this project.

roobert commented 9 years ago

@stevendborrelli, ahh I appreciate it's a Sunday so thank you for your reply. It's a shame I can't seem to get this going after a few hours of trying different things and poking around in the source code.

Traefik looks really interesting so I'll give that a go, thanks for the tip!

roobert commented 9 years ago

After having a look at Traefik it looks great but also quite new and somewhat limited.

I would love to get marathon-console going with consul-template and haproxy so any advice would be greatly appreciated.

stevendborrelli commented 9 years ago

@roobert can you try setting --registry-prefix to http?

Also, we don't recommend running consul in a docker container.

roobert commented 9 years ago

Hi @stevendborrelli, I've just tried with --registry-prefix set, but still get the same error.

I've attached a load more debug info to this gist, thanks very much for you advice so far: https://gist.github.com/roobert/ffd48a43fc73950094a9

I'll give running consul on the host a go but I'm not sure that will make much difference as the docker instance will still be connecting to it's own address.

Cheers,

roobert commented 9 years ago

As expected, no difference.

I can only think that somewhere in the code the config is being overwritten with the default or there's a race condition (although that wouldn't explain later refreshes).

The only thing I can think left to try is changing the default in the code and recompiling, unless you have any other ideas?

Cheers,

ChrisAubuchon commented 9 years ago

Try changing --registry=172.17.42.1:8500 to --registry=http://172.17.42.1:8500. marathon-consul uses the net/url package that requires a scheme to parse properly.

roobert commented 9 years ago

Aaaaaaahhhh!!!! That fixed it, thank you kindly, I should've spotted it.

Thanks again and thanks for this great project!

BrianHicks commented 9 years ago

Reopening so we can get a warning in the logs in a future version