Esri / geoprocessing-tools-for-hadoop

The Hadoop GP Toolbox provides tools to exchange features between a Geodatabase and Hadoop and run Hadoop workflow jobs.
Apache License 2.0
74 stars 47 forks source link

Port Question - Copy from HDFS #17

Open StefanJaq opened 7 years ago

StefanJaq commented 7 years ago

Does the tool "copy from HDFS" communicate only via the namenode port, which is usually 50070? Or can it use other ports like from datanodes or zookeeper?

Additional question: If the customer is not sure which port his namenode (HDFS TCP port number) is configured, how could he find out which port to use?

randallwhitman commented 7 years ago

It connects to the namenode and also directly to data nodes. In answer to the second question, try confirming the WebHDFS port in a web browser, i.e. by pasting in http://host.example.net:50070 to the address bar.

StefanJaq commented 7 years ago

Thanks for your reply. But I'm not quite sure if I understand your answer correctly. I'll try to ask the question in a different way. Can the customer enter the name of the datanode and its port instead of the namenode and namenode-port? And: Is it neccessary that the tool can access the datanode? As we have a Firewall between the ArcMap-client and the Hadoop-Server, do we Need to open other ports too?

My second question above: The question is how to find the namenode port if it is not 50070? (I know that this is something not directly connected to this tool)

randallwhitman commented 7 years ago

The name node is required in order to get file blocks from the right data nodes. Multiple ports will need to be open for the Geoprocessing Tools for Hadoop. If namenode is configured on non-default port, look up the port in the configuration files, else ask the System Administrator.