Open mateusz-blaszkowski opened 8 years ago
On GCE, every instance in a network is reachable by hostname. Same on AWS, if the VPC is configured with DNS hostnames.
This seems to be a general Hadoop requirement (see the Cloudera docs). We might be able to set dfs.namenode.datanode.registration.ip-hostname-check
to false
, but I can't find an equivalent for YARN.
@mateusz-blaszkowski - do you want to include your /etc/hosts
patch in PerfKitBenchmarker? The addition could be dependent on nodes not being reachable by name.
Any thoughts on this?
I have this same problem with Cassandra YCSB on OpenStack. Maybe we can add hostname to /etc/hosts in creating VM post steps for sure.
The symptom is very similar to the ones described in #142 and #744. I have set the
terasort_num_rows
to the small number (like 1000) so that I can exclude the problem with long-lasting generate/sort process. The memory threshold is also high enough (like 32 or 64GB). The benchmark hangs on the teragen phase (/tmp/pkb/hadoop/bin/yarn jar /tmp/pkb/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar teragen 1000 /teragen' 1> /tmp/pkb/cmd6e619289-6a25-48dc-af46-ff5e86e90808.log
). After a deep dive it turned out that Hadoop cluster encountered issues with reaching other instances by hostname:This is the ResourceManager log from the master instance. As you can see it tried to communicate with
pkb-e358f15f-1
instance but it couldn't resolve the hostname. I did a workaround for this in hadoop_terasort_benchmark.py by simply generating new entries in /etc/hosts file for each instance. But I have concerns if it's the right solution because it may be Mesos (or Kubernetes) specific. How is this resolved in GCE? Is every instance reachable by every other instances in the same network using the hostname of the instances? Is anyone aware if this can be fixed in Hadoop configuration itself (by for example forcing ResourceManager to use IP addresses instead of hostnames)?