CiscoCloud / marathon-consul

bridge Marathon information to Consul KV
Apache License 2.0
85 stars 18 forks source link

marathon-consul does not reconcile running apps when starting up #3

Closed hellertime closed 9 years ago

hellertime commented 9 years ago

If an is started by marathon before marathon-consul is running, it does not properly update the consul state with the state of this app, until the app is restarted.

This can be observed by stopping an app, stopping marathon-consul, starting the previously stopped app, then starting marathon consul. Since no event is generated the consul state is not changed.

I think a useful remedy would be to have marathon-consul query the state of all apps at startup, and replace any values under the /kv/marathon path with the current state of the world. In the case where nothing has changed this should be idempotent, but in the case where an apps state changed while marathon-consul was away, this will get things back on track.

stevendborrelli commented 9 years ago

@hellertime thanks for pointing this out. We've seen a similar issue with registrator, where unless you enable resync in master consul will start to get populated by extra entries.

hellertime commented 9 years ago

Sure thing. I actually switched away from registrator for this reason. It was hard to get it to reconcile state correctly -- at least with marathon-consul I can just wipeout the /kv/marathon keys recursively for now!

Sent from my iPhone

On Jun 11, 2015, at 3:45 PM, Steven Borrelli notifications@github.com wrote:

@hellertime thanks for pointing this out. We've seen a similar issue with registrator, where unless you enable resync in master consul will start to get populated by extra entries.

— Reply to this email directly or view it on GitHub.

Gingonic commented 9 years ago

I would recommend a resyncing functionality too. As if for some reason the event is not correctly transmitted to consul, the registry will miss the entry for good, As far as I know consul should be able to detect that an endpoint simply does not exists anymore and self heal. But even here, based on marathons autmated port allocation, the port on a specific host might have been repopulated with a new service before the first error occurs. Unlikely, but possible and try and debug that... Resyncing once in a while should prevent this kind problems to persist over a too long period of time.

BrianHicks commented 9 years ago

all above: yes, I agree! Now that we've merged #4, I'm about to get started on this. Work will be tracked on feature/resync.

hellertime commented 9 years ago

@BrianHicks looking forward to this feature! Thanks.

In a more general question. Once marathon supports the Mesos service discovery info messages, what are the plans for marathon-consul and mesos-consul, will there still be a need for both?

BrianHicks commented 9 years ago

There may still be a need for both, we'll see when we get there. We don't have super well defined plans at the moment, but that functionality would probably go in mesos-consul. On Jun 24, 2015 6:54 AM, "Chris Heller" notifications@github.com wrote:

@BrianHicks https://github.com/BrianHicks looking forward to this feature! Thanks.

In a more general question. Once marathon supports the Mesos service discovery info messages, what are the plans for marathon-consul and mesos-consul, will there still be a need for both?

— Reply to this email directly or view it on GitHub https://github.com/CiscoCloud/marathon-consul/issues/3#issuecomment-114843107 .

BrianHicks commented 9 years ago

@hellertime @Gingonic this is ready to go but needs testing. Would you be able to test #6?

hellertime commented 9 years ago

Tried out #6 but I'm not seeing an improvement, though I think it is a simple error. I ran with --log-level=debug and have included the output here:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   164  100   164    0     0  11414      0 --:--:-- --:--:-- --:--:-- 12615
time="2015-06-25T15:20:15Z" level=info msg=listening port=":4000" 
time="2015-06-25T15:20:15Z" level=info msg="syncing apps" 
time="2015-06-25T15:20:15Z" level=debug msg="asking Marathon for apps" location=marathon.shaka.r.tn.a.com 
I0625 15:20:15.598287 16705 logging.cpp:172] INFO level logging started!
I0625 15:20:15.604545 16705 exec.cpp:132] Version: 0.22.1
I0625 15:20:15.606587 16716 exec.cpp:206] Executor registered on slave 20150624-121815-3315733190-5050-16590-S0
time="2015-06-25T15:20:15Z" level=info msg="syncing tasks" 
time="2015-06-25T15:20:15Z" level=debug msg="syncing tasks for app" app="/bootstrap/image-deployment/docker-registry" 
time="2015-06-25T15:20:15Z" level=debug msg="asking Marathon for tasks" app="/bootstrap/image-deployment/docker-registry" location=marathon.shaka.r.tn.a.com 
time="2015-06-25T15:20:15Z" level=error msg="invalid character 'b' looking for beginning of value" location=marathon.shaka.r.tn.a.com statusCode="È" 
time="2015-06-25T15:20:16Z" level=info msg="not handling event" eventType="subscribe_event" 
time="2015-06-25T15:20:16Z" level=debug msg="{\"clientIp\":\"198.18.162.197\",\"callbackUrl\":\"http://osd07.shaka.r.tn.a.com:31000/events\",\"eventType\":\"subscribe_event\",\"timestamp\":\"2015-06-25T15:20:20.547Z\"}" 
time="2015-06-25T15:20:16Z" level=info msg="handling event" eventType="status_update_event" 
time="2015-06-25T15:20:16Z" level=debug msg="{\"slaveId\":\"20150624-121815-3315733190-5050-16590-S0\",\"taskId\":\"bootstrap_service-discovery_marathon-consul.b11cbb5c-1b4d-11e5-8727-56847afe9799\",\"taskStatus\":\"TASK_RUNNING\",\"message\":\"\",\"appId\":\"/bootstrap/service-discovery/marathon-consul\",\"host\":\"osd07.shaka.r.tn.a.com\",\"ports\":[31000],\"version\":\"2015-06-25T15:20:19.256Z\",\"eventType\":\"status_update_event\",\"timestamp\":\"2015-06-25T15:20:20.659Z\"}" 
time="2015-06-25T15:20:18Z" level=info msg="not handling event" eventType="remove_health_check_event" 
time="2015-06-25T15:20:18Z" level=debug msg="{\"appId\":\"/bootstrap/service-discovery/marathon-consul\",\"eventType\":\"remove_health_check_event\",\"timestamp\":\"2015-06-25T15:20:23.267Z\"}" 
time="2015-06-25T15:20:18Z" level=info msg="not handling event" eventType="remove_health_check_event" 
time="2015-06-25T15:20:18Z" level=debug msg="{\"appId\":\"/bootstrap/service-discovery/marathon-consul\",\"eventType\":\"remove_health_check_event\",\"timestamp\":\"2015-06-25T15:20:23.268Z\"}" 
time="2015-06-25T15:21:14Z" level=info msg="not handling event" eventType="health_status_changed_event" 
time="2015-06-25T15:21:14Z" level=debug msg="{\"appId\":\"/bootstrap/service-discovery/marathon-consul\",\"taskId\":\"bootstrap_service-discovery_marathon-consul.b11cbb5c-1b4d-11e5-8727-56847afe9799\",\"version\":\"2015-06-25T15:20:19.256Z\",\"alive\":true,\"eventType\":\"health_status_changed_event\",\"timestamp\":\"2015-06-25T15:21:19.473Z\"}" 
time="2015-06-25T15:21:14Z" level=info msg="not handling event" eventType="deployment_success" 
time="2015-06-25T15:21:14Z" level=debug msg="{\"id\":\"db9baf2f-46bf-49e3-b12d-0b7918910db3\",\"eventType\":\"deployment_success\",\"timestamp\":\"2015-06-25T15:21:19.474Z\"}" 
time="2015-06-25T15:21:14Z" level=info msg="not handling event" eventType="deployment_step_success" 
time="2015-06-25T15:21:14Z" level=debug msg="{\"plan\":{\"id\":\"db9baf2f-46bf-49e3-b12d-0b7918910db3\",\"original\":{\"id\":\"/\",\"apps\":[{\"id\":\"/spark-cluster\",\"cmd\":null,\"args\":[],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":4.0,\"mem\":2048.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10007],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[{\"containerPath\":\"/a\",\"hostPath\":\"/usr/local/a\",\"mode\":\"RO\"}],\"docker\":{\"image\":\"registry.shaka.r.tn.a.com/appimage-spark-cluster:1.10\",\"privileged\":false,\"parameters\":[],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T12:03:43.957Z\"}],\"groups\":[{\"id\":\"/ingest\",\"apps\":[],\"groups\":[{\"id\":\"/ingest/collection\",\"apps\":[{\"id\":\"/ingest/collection/rt-collector\",\"cmd\":null,\"args\":[\"--verbose\"],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":2.0,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10003,10004],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"registry.shaka.r.tn.a.com/appimage-b-collector:1.11\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":8088,\"hostPort\":0,\"servicePort\":10003,\"protocol\":\"tcp\"},{\"containerPort\":8081,\"hostPort\":0,\"servicePort\":10004,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_TOPIC=r_rt_ingress\"},{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_PATH_PREFIX=kafka.k1\"}],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-24T16:17:42.807Z\"},{\"id\":\"/ingest/collection/nt-collector\",\"cmd\":null,\"args\":[],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":2.0,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10005,10006],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"registry.shaka.r.tn.a.com/appimage-b-collector:1.11\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":8088,\"hostPort\":0,\"servicePort\":10005,\"protocol\":\"tcp\"},{\"containerPort\":8081,\"hostPort\":0,\"servicePort\":10006,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_TOPIC=r_nt_ingress\"},{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_PATH_PREFIX=kafka.k1\"}],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-24T16:17:42.807Z\"}],\"groups\":[],\"dependencies\":[],\"version\":\"2015-06-25T15:20:00.084Z\"}],\"dependencies\":[],\"version\":\"2015-06-25T15:20:00.084Z\"},{\"id\":\"/bootstrap\",\"apps\":[],\"groups\":[{\"id\":\"/bootstrap/image-deployment\",\"apps\":[{\"id\":\"/bootstrap/image-deployment/docker-registry\",\"cmd\":null,\"args\":[],\"user\":null,\"env\":{},\"instances\":3,\"cpus\":0.25,\"mem\":256.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10002],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"baseimage-docker-registry:0.9.1\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":5000,\"hostPort\":0,\"servicePort\":10002,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"SETTINGS_FLAVOR=ceph-s3\"},{\"key\":\"env\",\"value\":\"AWS_ENCRYPT=false\"},{\"key\":\"env\",\"value\":\"AWS_SECURE=false\"},{\"key\":\"env\",\"value\":\"AWS_BUCKET=docker-registry\"},{\"key\":\"env\",\"value\":\"AWS_HOST=radosgw.shaka.r.tn.a.com\"},{\"key\":\"env\",\"value\":\"AWS_PORT=80\"},{\"key\":\"env\",\"value\":\"AWS_KEY=E0OA8OV0BH9V5RCAZTNF\"},{\"key\":\"env\",\"value\":\"AWS_SECRET=N6yP0iDQJqhHkUOanBvFFmfH14M2lptLr9yKCP0f\"},{\"key\":\"env\",\"value\":\"STORAGE_PATH=/registry\"}],\"forcePullImage\":false}},\"healthChecks\":[{\"path\":\"/v1/_ping\",\"protocol\":\"HTTP\",\"portIndex\":0,\"gracePeriodSeconds\":300,\"intervalSeconds\":60,\"timeoutSeconds\":20,\"maxConsecutiveFailures\":3,\"ignoreHttp1xx\":false}],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T14:22:21.085Z\"}],\"groups\":[],\"dependencies\":[\"/bootstrap/service-discovery\"],\"version\":\"2015-06-25T15:20:00.084Z\"},{\"id\":\"/bootstrap/service-discovery\",\"apps\":[{\"id\":\"/bootstrap/service-discovery/mesos-consul\",\"cmd\":null,\"args\":[\"--registry-ssl\",\"--registry-ssl-verify=false\",\"--registry-port=8501\"],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":0.1,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10001],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[{\"containerPath\":\"/a\",\"hostPath\":\"/usr/local/a\",\"mode\":\"RO\"}],\"docker\":{\"image\":\"baseimage-mesos-consul:1.1\",\"network\":\"BRIDGE\",\"privileged\":false,\"parameters\":[],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T14:22:21.085Z\"},{\"id\":\"/bootstrap/service-discovery/marathon-consul\",\"cmd\":null,\"args\":[\"--registry-noverify\",\"--registry=https://consul.service.consul:8501\",\"--log-level=debug\",\"--marathon-location=marathon.shaka.r.tn.a.com\"],\"user\":null,\"env\":{},\"instances\":0,\"cpus\":0.1,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10000],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"baseimage-marathon-consul:1.2\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":4000,\"hostPort\":0,\"servicePort\":10000,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"MARATHON_HOST=marathon.shaka.r.tn.a.com\"}],\"forcePullImage\":false}},\"healthChecks\":[{\"path\":\"/health\",\"protocol\":\"HTTP\",\"portIndex\":0,\"gracePeriodSeconds\":300,\"intervalSeconds\":60,\"timeoutSeconds\":20,\"maxConsecutiveFailures\":3,\"ignoreHttp1xx\":false}],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T15:20:00.084Z\"}],\"groups\":[],\"dependencies\":[],\"version\":\"2015-06-25T15:20:00.084Z\"}],\"dependencies\":[],\"version\":\"2015-06-25T15:20:00.084Z\"}],\"dependencies\":[],\"version\":\"2015-06-25T15:20:00.084Z\"},\"target\":{\"id\":\"/\",\"apps\":[{\"id\":\"/spark-cluster\",\"cmd\":null,\"args\":[],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":4.0,\"mem\":2048.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10007],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[{\"containerPath\":\"/a\",\"hostPath\":\"/usr/local/a\",\"mode\":\"RO\"}],\"docker\":{\"image\":\"registry.shaka.r.tn.a.com/appimage-spark-cluster:1.10\",\"privileged\":false,\"parameters\":[],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T12:03:43.957Z\"}],\"groups\":[{\"id\":\"/ingest\",\"apps\":[],\"groups\":[{\"id\":\"/ingest/collection\",\"apps\":[{\"id\":\"/ingest/collection/rt-collector\",\"cmd\":null,\"args\":[\"--verbose\"],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":2.0,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10003,10004],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"registry.shaka.r.tn.a.com/appimage-b-collector:1.11\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":8088,\"hostPort\":0,\"servicePort\":10003,\"protocol\":\"tcp\"},{\"containerPort\":8081,\"hostPort\":0,\"servicePort\":10004,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_TOPIC=r_rt_ingress\"},{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_PATH_PREFIX=kafka.k1\"}],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-24T16:17:42.807Z\"},{\"id\":\"/ingest/collection/nt-collector\",\"cmd\":null,\"args\":[],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":2.0,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10005,10006],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"registry.shaka.r.tn.a.com/appimage-b-collector:1.11\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":8088,\"hostPort\":0,\"servicePort\":10005,\"protocol\":\"tcp\"},{\"containerPort\":8081,\"hostPort\":0,\"servicePort\":10006,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_TOPIC=r_nt_ingress\"},{\"key\":\"env\",\"value\":\"COLLECTOR_KAFKA_PATH_PREFIX=kafka.k1\"}],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-24T16:17:42.807Z\"}],\"groups\":[],\"dependencies\":[],\"version\":\"2015-06-25T15:20:19.256Z\"}],\"dependencies\":[],\"version\":\"2015-06-25T15:20:19.256Z\"},{\"id\":\"/bootstrap\",\"apps\":[],\"groups\":[{\"id\":\"/bootstrap/image-deployment\",\"apps\":[{\"id\":\"/bootstrap/image-deployment/docker-registry\",\"cmd\":null,\"args\":[],\"user\":null,\"env\":{},\"instances\":3,\"cpus\":0.25,\"mem\":256.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10002],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"baseimage-docker-registry:0.9.1\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":5000,\"hostPort\":0,\"servicePort\":10002,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"SETTINGS_FLAVOR=ceph-s3\"},{\"key\":\"env\",\"value\":\"AWS_ENCRYPT=false\"},{\"key\":\"env\",\"value\":\"AWS_SECURE=false\"},{\"key\":\"env\",\"value\":\"AWS_BUCKET=docker-registry\"},{\"key\":\"env\",\"value\":\"AWS_HOST=radosgw.shaka.r.tn.a.com\"},{\"key\":\"env\",\"value\":\"AWS_PORT=80\"},{\"key\":\"env\",\"value\":\"AWS_KEY=E0OA8OV0BH9V5RCAZTNF\"},{\"key\":\"env\",\"value\":\"AWS_SECRET=N6yP0iDQJqhHkUOanBvFFmfH14M2lptLr9yKCP0f\"},{\"key\":\"env\",\"value\":\"STORAGE_PATH=/registry\"}],\"forcePullImage\":false}},\"healthChecks\":[{\"path\":\"/v1/_ping\",\"protocol\":\"HTTP\",\"portIndex\":0,\"gracePeriodSeconds\":300,\"intervalSeconds\":60,\"timeoutSeconds\":20,\"maxConsecutiveFailures\":3,\"ignoreHttp1xx\":false}],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T14:22:21.085Z\"}],\"groups\":[],\"dependencies\":[\"/bootstrap/service-discovery\"],\"version\":\"2015-06-25T15:20:19.256Z\"},{\"id\":\"/bootstrap/service-discovery\",\"apps\":[{\"id\":\"/bootstrap/service-discovery/mesos-consul\",\"cmd\":null,\"args\":[\"--registry-ssl\",\"--registry-ssl-verify=false\",\"--registry-port=8501\"],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":0.1,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10001],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[{\"containerPath\":\"/a\",\"hostPath\":\"/usr/local/a\",\"mode\":\"RO\"}],\"docker\":{\"image\":\"baseimage-mesos-consul:1.1\",\"network\":\"BRIDGE\",\"privileged\":false,\"parameters\":[],\"forcePullImage\":false}},\"healthChecks\":[],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T14:22:21.085Z\"},{\"id\":\"/bootstrap/service-discovery/marathon-consul\",\"cmd\":null,\"args\":[\"--registry-noverify\",\"--registry=https://consul.service.consul:8501\",\"--log-level=debug\",\"--marathon-location=marathon.shaka.r.tn.a.com\"],\"user\":null,\"env\":{},\"instances\":1,\"cpus\":0.1,\"mem\":64.0,\"disk\":0.0,\"executor\":\"\",\"constraints\":[[\"hostname\",\"UNIQUE\"]],\"uris\":[],\"storeUrls\":[],\"ports\":[10000],\"requirePorts\":false,\"backoffSeconds\":1,\"backoffFactor\":1.15,\"maxLaunchDelaySeconds\":3600,\"container\":{\"type\":\"DOCKER\",\"volumes\":[],\"docker\":{\"image\":\"baseimage-marathon-consul:1.2\",\"network\":\"BRIDGE\",\"portMappings\":[{\"containerPort\":4000,\"hostPort\":0,\"servicePort\":10000,\"protocol\":\"tcp\"}],\"privileged\":false,\"parameters\":[{\"key\":\"env\",\"value\":\"MARATHON_HOST=marathon.shaka.r.tn.a.com\"}],\"forcePullImage\":false}},\"healthChecks\":[{\"path\":\"/health\",\"protocol\":\"HTTP\",\"portIndex\":0,\"gracePeriodSeconds\":300,\"intervalSeconds\":60,\"timeoutSeconds\":20,\"maxConsecutiveFailures\":3,\"ignoreHttp1xx\":false}],\"dependencies\":[],\"upgradeStrategy\":{\"minimumHealthCapacity\":1.0,\"maximumOverCapacity\":1.0},\"labels\":{},\"version\":\"2015-06-25T15:20:19.256Z\"}],\"groups\":[],\"dependencies\":[],\"version\":\"2015-06-25T15:20:19.256Z\"}],\"dependencies\":[],\"version\":\"2015-06-25T15:20:19.256Z\"}],\"dependencies\":[],\"version\":\"2015-06-25T15:20:19.256Z\"},\"steps\":[{\"actions\":[{\"type\":\"ScaleApplication\",\"app\":\"/bootstrap/service-discovery/marathon-consul\"}]}],\"version\":\"2015-06-25T15:20:19.256Z\"},\"currentStep\":{\"actions\":[{\"type\":\"ScaleApplication\",\"app\":\"/bootstrap/service-discovery/marathon-consul\"}]},\"eventType\":\"deployment_step_success\",\"timestamp\":\"2015-06-25T15:21:19.474Z\"}" 

Note the line:

time="2015-06-25T15:20:15Z" level=error msg="invalid character 'b' looking for beginning of value" location=marathon.shaka.r.tn.a.com statusCode="�" 
BrianHicks commented 9 years ago

Interesting! Are you using HTTP or HTTPS? That looks like the beginning of an HTTPS response, there’s a new flag --marathon-protocol. Is that set to the right value for your environment?

hellertime commented 9 years ago

This is HTTP. It is via an HAProxy which does have SSL enabled, but this particular backend is all HTTP.

BrianHicks commented 9 years ago

Could you post your complete command string? I can try to replicate. Any non-ascii characters in any of your app names?

hellertime commented 9 years ago

Sure the full command is (I'm using a Docker container,forked from the docker hub project):

--registry-noverify --registry=https://consul.service.consul:8501 --log-level=debug

also MARATHON_HOST is set in the environment, but I'm not sure that is used anymore with this new code.

BrianHicks commented 9 years ago

marathon-consul doesn't use environment variables. In fact, I'm not sure it ever has. It hasn't ever opened connections to Marathon before either, so these are completely new flags. You'll need to set --marathon-location to wherever your Marathon is. How are you passing that marathon.shaka.r.tn.a.com value now?

hellertime commented 9 years ago

Oh sorry I had also used --marathon-location (I tried to recreate the command line from memory, apparently the marathon config history does not show the value of args)

hellertime commented 9 years ago

Ok I specified the port this time. It looks like the HAProxy was getting in the way some how. I'll do some more testing, but it looks like the issue was just an error in connectivity.

hellertime commented 9 years ago

Maybe I spoke too soon. I just tried a test where I uploaded a new app, then stopped marathon-consul, restarted some other services than started marathon-consul. I'm seeing both the old and new entries still and saw this now in the log:

time="2015-06-26T01:18:54Z" level=info msg=listening port=":4000" 
time="2015-06-26T01:18:54Z" level=info msg="syncing apps" 
time="2015-06-26T01:18:54Z" level=debug msg="asking Marathon for apps" location="marathon.shaka.r.tn.a.com:8086" 
time="2015-06-26T01:18:54Z" level=info msg="not handling event" eventType="subscribe_event" 
time="2015-06-26T01:18:54Z" level=debug msg="{\"clientIp\":\"198.18.162.196\",\"callbackUrl\":\"http://osd01.shaka.r.tn.a.com:10000/events\",\"eventType\":\"subscribe_event\",\"timestamp\":\"2015-06-26T01:19:00.858Z\"}" 
I0626 01:18:54.227569  1437 logging.cpp:172] INFO level logging started!
I0626 01:18:54.228809  1437 exec.cpp:132] Version: 0.22.1
I0626 01:18:54.230458  1453 exec.cpp:206] Executor registered on slave 20150420-133515-3315733190-5050-4643-S19
time="2015-06-26T01:18:54Z" level=info msg="handling event" eventType="status_update_event" 
time="2015-06-26T01:18:54Z" level=debug msg="{\"slaveId\":\"20150420-133515-3315733190-5050-4643-S19\",\"taskId\":\"bootstrap_service-discovery_marathon-consul.53456756-1ba1-11e5-8727-56847afe9799\",\"taskStatus\":\"TASK_RUNNING\",\"message\":\"\",\"appId\":\"/bootstrap/service-discovery/marathon-consul\",\"host\":\"osd01.shaka.r.tn.a.com\",\"ports\":[10000],\"version\":\"2015-06-26T01:19:00.171Z\",\"eventType\":\"status_update_event\",\"timestamp\":\"2015-06-26T01:19:01.022Z\"}" 
time="2015-06-26T01:18:54Z" level=info msg="syncing tasks" 
time="2015-06-26T01:18:54Z" level=debug msg="syncing tasks for app" app="/appimage-hdfs-httpfs" 
time="2015-06-26T01:18:54Z" level=debug msg="asking Marathon for tasks" app="/appimage-hdfs-httpfs" location="marathon.shaka.r.tn.a.com:8086" 
time="2015-06-26T01:18:54Z" level=error msg="invalid character 'a' looking for beginning of value" location="marathon.shaka.r.tn.a.com:8086" statusCode="È" 
BrianHicks commented 9 years ago

Looks like it was from the JSON parsing... Marathon returns JSON by default for every response but the tasks. 63d03d615db55d6ddab6229df1aaae6cf684a867 should work for you, can you try?

hellertime commented 9 years ago

I'll have a look. Just need to sync this into my github fork...

BrianHicks commented 9 years ago

@hellertime I'm going to go ahead and get this merged. Thanks for your testing so far, would you please open another ticket if you run into more problems?

hellertime commented 9 years ago

Yes. Please go ahead. I unfortunately ran into some trouble with my setup which delayed testing, but I'll def open issue if I run into any.