confluentinc / cp-docker-images

[DEPRECATED] Docker images for Confluent Platform.
Apache License 2.0
1.14k stars 705 forks source link

Problems when KAFKA_JMX_HOSTNAME not specified on Amazon ECS - "hostname: Name or service not known" #450

Open hackmad opened 6 years ago

hackmad commented 6 years ago

On Amazon ECS if you set KAFKA_JMX_PORT without setting KAFKA_JMX_HOSTNAME, the following line results in and error hostname: Name or service not known and the environment variable is exported with an empty string.

https://github.com/confluentinc/cp-docker-images/blob/c9602a45d38cf877052ed31f98c7590e07919556/debian/zookeeper/include/etc/confluent/docker/launch#L28

Testing out the command in the container:

root@ff51b2cce038:/# hostname -i | cut -d" " -f1
hostname: Name or service not known

I believe that -i needs to be replaced with --all-ip-addresses:

root@ff51b2cce038:/# hostname --all-ip-addresses | cut -d" " -f1
169.254.172.2
hackmad commented 6 years ago

We tried working around this by writing our own Dockerfile based like this:

FROM confluentinc/cp-zookeeper:4.0.0-3

MAINTAINER LoyaltyOne

COPY bootstrap /usr/local/bin/

ENTRYPOINT ["/usr/local/bin/bootstrap"]
CMD ["/etc/confluent/docker/run"

The bootstrap script:

#!/bin/bash

set -e

# Specifying KAFKA_JMX_HOSTNAME in ECS causes problems because service discovery isn't available yet.
# If the JMX port is specified but the host name isn't, use the container's IP bound to the ENI
# Note: This is specifically for containers running in ECS in vpc networking mode.
if [ -n "${KAFKA_JMX_PORT}" ];
then
    export KAFKA_JMX_HOSTNAME=${KAFKA_JMX_HOSTNAME:-$(hostname -I | cut -d" " -f2)}
    echo "KAFKA_JMX_HOSTNAME: ${KAFKA_JMX_HOSTNAME}"
fi

exec "$@"

But we now get this exception:

Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: 7d3f41d6a5c9: 7d3f41d6a5c9: Name or service not known

In the ECS task logs it is correctly picking the IP address:

2018-03-22 23:10:12
KAFKA_JMX_HOSTNAME: 10.0.5.248
echo "===> ENV Variables ..."
+ echo '===> ENV Variables ...'
env | sort
===> ENV Variables ...
+ env
+ sort
ALLOW_UNSIGNED=false
COMPONENT=zookeeper
CONFLUENT_DEB_VERSION=1
CONFLUENT_MAJOR_VERSION=4
CONFLUENT_MINOR_VERSION=0
CONFLUENT_MVN_LABEL=
CONFLUENT_PATCH_VERSION=0
CONFLUENT_PLATFORM_LABEL=
CONFLUENT_VERSION=4.0.0
CUB_CLASSPATH=/etc/confluent/docker/docker-utils.jar
HOME=/root
HOSTNAME=7d3f41d6a5c9
KAFKA_JMX_HOSTNAME=10.0.5.248
KAFKA_JMX_PORT=8989
KAFKA_VERSION=1.0.0
LANG=C.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
PYTHON_PIP_VERSION=8.1.2
PYTHON_VERSION=2.7.9-1
SCALA_VERSION=2.11
SHLVL=1
ZOOKEEPER_CLIENT_PORT=2181
ZOOKEEPER_INIT_LIMIT=5
ZOOKEEPER_SERVERS=dev-kafka-zk1.**********:2888:3888;dev-kafka-zk2.**********:2888:3888;0.0.0.0:2888:3888
ZOOKEEPER_SERVER_ID=3
ZOOKEEPER_SYNC_LIMIT=2
ZOOKEEPER_TICK_TIME=2000
ZULU_OPENJDK_VERSION=8=8.17.0.3
_=/usr/bin/env
echo "===> User"
+ echo '===> User'
id
===> User
+ id
uid=0(root) gid=0(root) groups=0(root)
echo "===> Configuring ..."
+ echo '===> Configuring ...'
/etc/confluent/docker/configure
===> Configuring ...
+ /etc/confluent/docker/configure
dub ensure ZOOKEEPER_CLIENT_PORT
+ dub ensure ZOOKEEPER_CLIENT_PORT
dub path /etc/kafka/ writable
+ dub path /etc/kafka/ writable
# myid is required for clusters
if [[ -n "${ZOOKEEPER_SERVERS-}" ]]
then
dub ensure ZOOKEEPER_SERVER_ID
export ZOOKEEPER_INIT_LIMIT=${ZOOKEEPER_INIT_LIMIT:-"10"}
export ZOOKEEPER_SYNC_LIMIT=${ZOOKEEPER_SYNC_LIMIT:-"5"}
fi
+ [[ -n dev-kafka-zk1.**********:2888:3888;dev-kafka-zk2.**********:2888:3888;0.0.0.0:2888:3888 ]]
+ dub ensure ZOOKEEPER_SERVER_ID
+ export ZOOKEEPER_INIT_LIMIT=5
+ ZOOKEEPER_INIT_LIMIT=5
+ export ZOOKEEPER_SYNC_LIMIT=2
+ ZOOKEEPER_SYNC_LIMIT=2
if [[ -n "${ZOOKEEPER_SERVER_ID-}" ]]
then
dub template "/etc/confluent/docker/myid.template" "/var/lib/${COMPONENT}/data/myid"
fi
+ [[ -n 3 ]]
+ dub template /etc/confluent/docker/myid.template /var/lib/zookeeper/data/myid
if [[ -n "${KAFKA_JMX_OPTS-}" ]]
then
if [[ ! $KAFKA_JMX_OPTS == *"com.sun.management.jmxremote.rmi.port"* ]]
then
echo "KAFKA_OPTS should contain 'com.sun.management.jmxremote.rmi.port' property. It is required for accessing the JMX metrics externally."
fi
fi
+ [[ -n '' ]]
dub template "/etc/confluent/docker/${COMPONENT}.properties.template" "/etc/kafka/${COMPONENT}.properties"
+ dub template /etc/confluent/docker/zookeeper.properties.template /etc/kafka/zookeeper.properties
dub template "/etc/confluent/docker/log4j.properties.template" "/etc/kafka/log4j.properties"
+ dub template /etc/confluent/docker/log4j.properties.template /etc/kafka/log4j.properties
dub template "/etc/confluent/docker/tools-log4j.properties.template" "/etc/kafka/tools-log4j.properties"
+ dub template /etc/confluent/docker/tools-log4j.properties.template /etc/kafka/tools-log4j.properties
echo "===> Running preflight checks ... "
===> Running preflight checks ...
+ echo '===> Running preflight checks ... '
/etc/confluent/docker/ensure
+ /etc/confluent/docker/ensure
echo "===> Check if /var/lib/zookeeper/data is writable ..."
+ echo '===> Check if /var/lib/zookeeper/data is writable ...'
dub path /var/lib/zookeeper/data writable
===> Check if /var/lib/zookeeper/data is writable ...
+ dub path /var/lib/zookeeper/data writable
===> Check if /var/lib/zookeeper/log is writable ...
echo "===> Check if /var/lib/zookeeper/log is writable ..."
+ echo '===> Check if /var/lib/zookeeper/log is writable ...'
dub path /var/lib/zookeeper/log writable
+ dub path /var/lib/zookeeper/log writable
echo "===> Launching ... "
+ echo '===> Launching ... '
exec /etc/confluent/docker/launch
+ exec /etc/confluent/docker/launch
===> Launching ...
===> Printing /var/lib/zookeeper/data/myid
3===> Launching zookeeper ...
Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException: 7d3f41d6a5c9: 7d3f41d6a5c9: Name or service not known

It seems to always pick the short hostname for the container. We have also tried hostname -A | cut -d" " -f1 which gives the internal hostname associated with the container.

Everything works fine if we do not enable JMX.

ukayani commented 6 years ago

👍

hackmad commented 6 years ago

So we solved it this way in our bootstrap script:

#!/bin/bash

set -e

if [ -n "${KAFKA_JMX_PORT}" ] || [ -n "${KAFKA_JMX_HOSTNAME}" ] || [ -n "${KAFKA_JMX_OPTS}" ];
then
    # JMX agent complains about malformed URLs and is not able to resolve ECS container
    # short hostnames. This will take care of the problem.
    echo "127.0.0.1     $HOSTNAME" | tee -a /etc/hosts
fi
echo "KAFKA_JMX_HOSTNAME: ${KAFKA_JMX_HOSTNAME}"
echo "KAFKA_JMX_PORT: ${KAFKA_JMX_PORT}"

exec "$@"

Also, note that on ECS we do need to provide both KAFKA_JMX_PORT and KAFKA_JMX_HOSTNAME for this to work.

soxofaan commented 5 years ago

I encountered same issue when following Kafka Connect tutorial https://docs.confluent.io/current/installation/docker/docs/installation/connect-avro-jdbc.html

I could isolate the hostname: Name or service not known warning to:

$ docker run -it --rm --net=host confluentinc/cp-base hostname -i
hostname: Name or service not known

$ docker run -it --rm --net=host confluentinc/cp-base hostname -I
192.168.65.3 172.17.0.1

$ docker run -it --rm confluentinc/cp-base hostname -i
172.17.0.2