bitsofinfo / hazelcast-docker-swarm-discovery-spi

Docker Swarm based discovery strategy SPI for Hazelcast enabled applications
Apache License 2.0
39 stars 33 forks source link

SMAP discovery attempts to cluster with service NOT in serivice-names #36

Closed robinroos closed 5 years ago

robinroos commented 5 years ago

Hello,

At present we have our Docker Swarm scaled at 1, and in each case docker-service-names contains ONLY that service's name and no other entries. We therefore expect each service to find itself but not to find any other instances (of its own service name).

One of our service names (ip-ui) is, coincidentally, a sub-string of another service name (ip-ui-or).

We observe that, despite the setting:

docker-service-names = ip_service_ip-ui

the SwarmDiscoveryUtil logs having

Found qualifying docker service[ip_service_ip-ui-or]

The log extract was taken from Kibana. Note also that, although the service ip-ui-or is ALSO logging to the same Kibana instance, I have filtered the messages to show only those arising from

docker.container.labels.com.docker.stack.namespace=ip_service
docker.container.labels.com.docker.swarm.service.name=ip_service_ip_ui

The log extract below is in reverse (most recent messages at the top)

    March 11th 2019, 13:55:02.448   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Returning set of discovered containers with size=2
    March 11th 2019, 13:55:02.448   INFO    com.hazelcast.spi.discovery.integration.DiscoveryService    [10.0.5.100]:5701 [test-ips-ui-session] [3.11.2] discoverNodes() DiscoveredContainers[2]: [10.0.5.101 : 8f88341b6968aa16be776d4e918ad1e07884413854bcd3e387b62db7448516ce, 10.0.5.100 : ac3a47f85d51f60591e8c37753cbc2429a4852777ca867dd8157ddf671b54d4a]
    March 11th 2019, 13:55:02.448   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service endpoint with networkId=ztcf1we68vd2cuuzv5demty79, addr=10.0.2.43/24
    March 11th 2019, 13:55:02.448   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service endpoint with networkId=z08r63wj5kk1s9plt28x98k0u, addr=10.0.4.145/24
    March 11th 2019, 13:55:02.447   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Found qualifying docker service task[taskId: p92srrxuwv4oug3libdk167u8, container: 8f88341b6968aa16be776d4e918ad1e07884413854bcd3e387b62db7448516ce, state: running] on network: ips_network[un0i63ypy2vssq4y86ym3y5zi:10.0.5.101/24]
    March 11th 2019, 13:55:02.437   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service endpoint with networkId=un0i63ypy2vssq4y86ym3y5zi, addr=10.0.5.163/24
    March 11th 2019, 13:55:02.437   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service endpoint with networkId=z08r63wj5kk1s9plt28x98k0u, addr=10.0.4.152/24
    March 11th 2019, 13:55:02.437   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Found qualifying docker service[ip_service_ip-ui-or] on network: ips_network[un0i63ypy2vssq4y86ym3y5zi:10.0.5.163/24]
    March 11th 2019, 13:55:02.437   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service with name=ip_service_ip-ui-or
    March 11th 2019, 13:55:02.437   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service endpoint with networkId=ztcf1we68vd2cuuzv5demty79, addr=10.0.2.46/24
    March 11th 2019, 13:55:02.437   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Found qualifying docker service task[taskId: wbmasl5g6a9f8n0ala7hguhzh, container: ac3a47f85d51f60591e8c37753cbc2429a4852777ca867dd8157ddf671b54d4a, state: running] on network: ips_network[un0i63ypy2vssq4y86ym3y5zi:10.0.5.100/24]
    March 11th 2019, 13:55:02.437   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Found own task, adding regardless of state.
    March 11th 2019, 13:55:02.428   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service endpoint with networkId=un0i63ypy2vssq4y86ym3y5zi, addr=10.0.5.172/24
    March 11th 2019, 13:55:02.428   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Found qualifying docker service[ip_service_ip-ui] on network: ips_network[un0i63ypy2vssq4y86ym3y5zi:10.0.5.172/24]
    March 11th 2019, 13:55:02.428   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Processing service with name=ip_service_ip-ui
    March 11th 2019, 13:55:02.428   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Number of services matching given criteria = 2
    March 11th 2019, 13:55:02.422   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Invoking criteria-based container discovery for dockerServiceName=ip_service_ip-ui
    March 11th 2019, 13:55:02.422   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy] Found relevant docker network: ips_network[un0i63ypy2vssq4y86ym3y5zi]
    March 11th 2019, 13:55:02.402   INFO    org.bitsofinfo.hazelcast.discovery.docker.swarm.SwarmDiscoveryUtil  SwarmDiscoveryUtil[DockerSwarmDiscoveryStrategy].discoverNodes(): via DOCKER_HOST: springboot-docker-proxy
docker-network-names = ips_network
docker-service-names = ip_service_ip-ui
docker-service-labels = 
swarmMgrUri = http://springboot-docker-proxy:2375
skipVerifySsl = false
robinroos commented 5 years ago

In our specific case we expect that the Hazelcast nodes will fail to cluster across application names, as their XML config specify different group names.

For a user which did not specify different group names, perhaps because they were using default hazelcast.xml configurations, they would be at risk of having their hazelcast nodes cluster without the intention that they do so.

robinroos commented 5 years ago

I believe the issue might lie in the Spotify DockerClient library which provides the Service.Criteria api, or actually in Hazelcast /services which interprets those criteria.

If those external libraries deliberately interpret serviceName criteria as being "beginWith" rather than "equals", then BitsOfInfo may have to add an explicit docker-service-names "contains" test for all services returned by the Criteria api.

robinroos commented 5 years ago

Here is confirmation that a cluster node of service "ip-ui" attempted to cluster with another node belonging to a different Hazelcast group:

March 11th 2019, 16:27:44.283 | WARN | com.hazelcast.cluster | [10.0.5.107]:5701 [test-ips-ui-session] [3.11.2] Node could not join cluster at node: [10.0.5.109]:5701 Cause: the target cluster has a different group-name

Group name "test-ips-ui-session" corresponds to the group name of the cache node (for SpringSession) of the service "ip-ui" in the "test" environment.

bitsofinfo commented 5 years ago

We call the Spotify docker client library. The Spotify lib calls the docker services API which has this behavior. Hazelcast will of course attempt to connect to whatever peer IPs are yielded by these api calls. For your case the simplest thing would be to just adjust how you name your services as they appear w/ docker service ls, otherwise fee free to submit a PR that implements some sort of negation capability to filter out the set after all api calls have been made.

robinroos commented 5 years ago

Cool, thanks. I will do a PR for this as it will help others and not just me.

robinroos commented 5 years ago

I have raised PR #37 to address this.

robinroos commented 5 years ago

If your review is approving please publish the same as RC14.