Esri / gis-tools-for-hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
http://esri.github.io/gis-tools-for-hadoop/
Apache License 2.0
519 stars 254 forks source link

'Unexected error: [Errno 11004] getaddrinfo failed' Error while Migrating GDB Feature Class to HDFS using ArcGISTools #40

Open mahendersg opened 8 years ago

mahendersg commented 8 years ago

We are facing issue while migrating GDB Feature Class to Hadoop HDFS using GIS Tools for Hadoop Geoprocessing tools. Following is the system environment details being used :

ArcGIS Client : 10.3.1/10.2.2 Hadoop version : hadoop 2.4.1 Python version : python 2.7.5 ArcSDE: 10.2.2 RDBMS: Oracle 11.2.0.4 ClusterInfo: MasterNode(Nos.1),Secondary Node(Nos.1),DataNodes(Nos.8)

Following steps followed to install and configure ArcGIS tools for hadoop environment:

' a) Added the ‘geo processing tools for hadoop' Downloaded from GIThub weblink 'https://github.com/Esri/gis-tools-for-hadoop' in hadoop.

b) Enabled webhdfs in hdfs by editing hdfs-site.xml in /opt/hadoop/etc/hadoop/hdfs-site.xml.

c) Added jar 'spatial-sdk-hadoop.jar' and 'esri – geometry - api.jar' in /opt/hadoop 2.4.1/share/hadoop/tools/lib location of our Hadoop master node.

d) Browse for ArcGIS Geoprocessing tool Tool box having python scripts for Hadoop using ArcCatalog 10.3.1

e) Above step enables hadoop tools for ArcGIS, and converted the feature class into json file using ‘features to json’ feature in hadoop toolbox.

f) ’Copy to hdfs’ Scripting tool in hadoop tool box of ArcGIS has been used in order to copy json files to hdfs.

g) Got Error message 'Unexected error: [Errno 11004] getaddrinfo failed'

Error message after running tool:

_Start Time: Wed Mar 09 18:43:44 2016 Running script CopyToHDFS... Unexpected error : [Errno 11004] getaddrinfo failed Traceback (most recent call last): File "", line 184, in execute File "D:\GIS tools for hadoop\geoprocessing-tools-for-hadoop-master\geoprocessing-tools-for-hadoop-master\webhdfs\webhdfs.py", line 91, in copyToHDFS fileUploadClient.request('PUT', redirect_path, open(source_path, "rb"), headers={}) File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 973, in request self._send_request(method, url, body, headers) File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 1007, in _send_request self.endheaders(body) File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 969, in endheaders self._send_output(message_body) File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 829, in _send_output self.send(msg) File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 791, in send self.connect() File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 772, in connect self.timeout, self.source_address) File "C:\Python27\ArcGIS10.2\Lib\socket.py", line 553, in create_connection for res in getaddrinfo(host, port, 0, SOCKSTREAM): gaierror: [Errno 11004] getaddrinfo failed

We followed all the guidelines and steps specified in following weblinks and references:

https://esri.github.io/gis-tools-for-hadoop/ https://github.com/Esri/gis-tools-for-hadoop/wiki

Please provide the resolution .

randallwhitman commented 8 years ago

Cross-reference: https://github.com/Esri/gis-tools-for-hadoop/issues/22 https://github.com/Esri/geoprocessing-tools-for-hadoop/issues/14

climbage commented 8 years ago

This error is happening during the redirect from the namenode to the datanode that is actually storing the data. You can tell because it has redirect_path in the stack trace.

fileUploadClient.request('PUT', redirect_path, open(source_path, "rb"), headers={})

First, verify that the datanodes are accessible to the client machine running ArcGIS? If they aren't, you will need to make those available to the client.

Second, verify that the namenode is not using network addresses that are internal to the cluster. If you browse to http://[namenode hostname]:50070/dfsnodelist.jsp?whatNodes=LIVE, you should see the Transferring Address that the namenode uses in datanode redirects. Make sure that the client is able to connect to those datanodes using the transferring addresses.

Let us know what you figure out.

mahendersg commented 8 years ago

We are still getting [Errno 11004] getaddrinfo failed using GIS Tools however proceeded with alternate method to to move the Building.json file size ~6.5 GB from Building Feature Class having ~48 million records to HDFS. Following steps were followed to load json in Hadoop HDFS using Hive in reference with document https://github.com/Esri/gis-tools-for-hadoop/wiki/Aggregating-CSV-Data-%28Spatial-Binning% Post migration of Building json to Building table Hive aggregation queries resulting in error. Detailed steps as below: :

Add Jar

add jar /volumes/disk1/tc/gis-tools-for-hadoop-master/gis-tools-for-hadoop-master/samples/lib/esri-geometry-api.jar; add jar /volumes/disk1/tc/gis-tools-for-hadoop-master/gis-tools-for-hadoop-master/samples/lib/spatial-sdk-hadoop.jar; create temporary function ST_Point as 'com.esri.hadoop.hive.ST_Point'; create temporary function ST_Contains as 'com.esri.hadoop.hive.ST_Contains'; create temporary function ST_AsText as 'com.esri.hadoop.hive.ST_AsText'; create temporary function ST_Intersection as 'com.esri.hadoop.hive.ST_Intersection';

Create Table

create external table Building(OBJECTID INT, RILUNIQUEID string, RILFEATURECODE string, BLDGNO string, BLDGNAME string, BLDGTYPE string, BLDGSUBTYPE string, BLDGCLASS string, BLDGROAD string, BLDGSUBROAD string, SUBLOCALITY string, CITYNAME string, STATENAME string, BLDGSIZE string, TAG string, PINCODE INT, NUMBEROFFLATS INT, NUMBEROFSHOPS INT, BLDG_TYPE string, CABLEOPERATORNAME string, AREA_1 INT, LBU2 string, SOCIETYCOMPLEXNAME string, BLDGCONDITION string, BLDGCONSTRUCTION string, AFFLUENCEINDICATOR string, ROOFTOPANTENNA string, REMARKS string, VINTAGE INT, BOI string, NETWORKREF string, NOOFCOMMERCIAL INT, BUILDING_RJID string, UPDATESOURCE string, PLOTSURVEYNO string, TPY_ID string, LOCALITYNAME string, SUBSUBLOCALITY string, CITYCODE string, LOCALITYCODE string, LOCALITY_RJID string, DATASOURCE string, CREATED_USER string, CREATED_DATE string, LAST_EDITED_USER string, LAST_EDITED_DATE string, LTERFS string, FTTXRFS string, BLCMSTATUS string, TALUKCODE string, TALUKNAME string, DISTRICTCODE string, DISTRICTNAME string, BOICATEGORY string, LTE_COVERAGE string, NEIGHBOURHOODCODE string, JIOCENTERNAME string, NUMBEROFFLOORS INT, VILLAGENAME string, VILLAGE_RJID string, JIOCENTERCODE string, BLDG_CATEGORY string, GLOBALID_1 string, JIOCENTER_RJID string, JIOCENTER_SAP_ID string, INCOME_LEVEL string,boundaryshape binary) ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

Load data

hadoop fs -put /volumes/disk1/tc/Building.json /volumes;

hadoop fs -ls /volumes;

LOAD DATA INPATH '/volumes/Building.json' OVERWRITE INTO TABLE Building;

There was no error observed in load data process.

hive> describe extended building; OK objectid int from deserializer riluniqueid string from deserializer rilfeaturecode string from deserializer bldgno string from deserializer bldgname string from deserializer bldgtype string from deserializer bldgsubtype string from deserializer bldgclass string from deserializer bldgroad string from deserializer bldgsubroad string from deserializer sublocality string from deserializer cityname string from deserializer statename string from deserializer bldgsize string from deserializer tag string from deserializer pincode int from deserializer numberofflats int from deserializer numberofshops int from deserializer bldg_type string from deserializer cableoperatorname string from deserializer area_1 int from deserializer lbu2 string from deserializer societycomplexname string from deserializer bldgcondition string from deserializer bldgconstruction string from deserializer affluenceindicator string from deserializer rooftopantenna string from deserializer remarks string from deserializer vintage int from deserializer boi string from deserializer networkref string from deserializer noofcommercial int from deserializer building_rjid string from deserializer updatesource string from deserializer plotsurveyno string from deserializer tpy_id string from deserializer localityname string from deserializer subsublocality string from deserializer citycode string from deserializer localitycode string from deserializer locality_rjid string from deserializer datasource string from deserializer created_user string from deserializer created_date string from deserializer last_edited_user string from deserializer last_edited_date string from deserializer lterfs string from deserializer fttxrfs string from deserializer blcmstatus string from deserializer talukcode string from deserializer talukname string from deserializer districtcode string from deserializer districtname string from deserializer boicategory string from deserializer lte_coverage string from deserializer neighbourhoodcode string from deserializer jiocentername string from deserializer numberoffloors int from deserializer villagename string from deserializer village_rjid string from deserializer jiocentercode string from deserializer bldg_category string from deserializer globalid_1 string from deserializer jiocenter_rjid string from deserializer jiocenter_sap_id string from deserializer income_level string from deserializer boundaryshape binary from deserializer

Detailed Table Information Table(tableName:building, dbName:landbase, owner:hadoop, createTime:1459342351, lastAccessTime:0, retention:0,

sd:StorageDescriptor(cols:[FieldSchema(name:objectid, type:int, comment:null), FieldSchema(name:riluniqueid, type:string, comment:null), FieldSchema

(name:rilfeaturecode, type:string, comment:null), FieldSchema(name:bldgno, type:string, comment:null), FieldSchema(name:bldgname, type:string,

comment:null), FieldSchema(name:bldgtype, type:string, comment:null), FieldSchema(name:bldgsubtype, type:string, comment:null), FieldSchema

(name:bldgclass, type:string, comment:null), FieldSchema(name:bldgroad, type:string, comment:null), FieldSchema(name:bldgsubroad, type:string,

comment:null), FieldSchema(name:sublocality, type:string, comment:null), FieldSchema(name:cityname, type:string, comment:null), FieldSchema

(name:statename, type:string, comment:null), FieldSchema(name:bldgsize, type:string, comment:null), FieldSchema(name:tag, type:string, comment:null),

FieldSchema(name:pincode, type:int, comment:null), FieldSchema(name:numberofflats, type:int, comment:null), FieldSchema(name:numberofshops, type:int,

comment:null), FieldSchema(name:bldg_type, type:string, comment:null), FieldSchema(name:cableoperatorname, type:string, comment:null), FieldSchema

(name:area_1, type:int, comment:null), FieldSchema(name:lbu2, type:string, comment:null), FieldSchema(name:societycomplexname, type:string,

comment:null), FieldSchema(name:bldgcondition, type:string, comment:null), FieldSchema(name:bldgconstruction, type:string, comment:null),

FieldSchema(name:affluenceindicator, type:string, comment:null), FieldSchema(name:rooftopantenna, type:string, comment:null), FieldSchema

(name:remarks, type:string, comment:null), FieldSchema(name:vintage, type:int, comment:null), FieldSchema(name:boi, type:string, comment:null),

FieldSchema(name:networkref, type:string, comment:null), FieldSchema(name:noofcommercial, type:int, comment:null), FieldSchema(name:building_rjid,

type:string, comment:null), FieldSchema(name:updatesource, type:string, comment:null), FieldSchema(name:plotsurveyno, type:string, comment:null),

FieldSchema(name:tpy_id, type:string, comment:null), FieldSchema(name:localityname, type:string, comment:null), FieldSchema(name:subsublocality,

type:string, comment:null), FieldSchema(name:citycode, type:string, comment:null), FieldSchema(name:localitycode, type:string, comment:null),

FieldSchema(name:locality_rjid, type:string, comment:null), FieldSchema(name:datasource, type:string, comment:null), FieldSchema(name:created_user,

type:string, comment:null), FieldSchema(name:created_date, type:string, comment:null), FieldSchema(name:last_edited_user, type:string, comment:null),

FieldSchema(name:last_edited_date, type:string, comment:null), FieldSchema(name:lterfs, type:string, comment:null), FieldSchema(name:fttxrfs,

type:string, comment:null), FieldSchema(name:blcmstatus, type:string, comment:null), FieldSchema(name:talukcode, type:string, comment:null),

FieldSchema(name:talukname, type:string, comment:null), FieldSchema(name:districtcode, type:string, comment:null), FieldSchema(name:districtname,

type:string, comment:null), FieldSchema(name:boicategory, type:string, comment:null), FieldSchema(name:lte_coverage, type:string, comment:null),

FieldSchema(name:neighbourhoodcode, type:string, comment:null), FieldSchema(name:jiocentername, type:string, comment:null), FieldSchema

(name:numberoffloors, type:int, comment:null), FieldSchema(name:villagename, type:string, comment:null), FieldSchema(name:village_rjid, type:string,

comment:null), FieldSchema(name:jiocentercode, type:string, comment:null), FieldSchema(name:bldg_category, type:string, comment:null), FieldSchema

(name:globalid_1, type:string, comment:null), FieldSchema(name:jiocenter_rjid, type:string, comment:null), FieldSchema(name:jiocenter_sap_id,

type:string, comment:null), FieldSchema(name:income_level, type:string, comment:null), FieldSchema(name:boundaryshape, type:binary, comment:null)],

location:hdfs://jiogis-cluster-jiogis-master-001:9000/user/hive/warehouse/landbase.db/building,

inputFormat:com.esri.json.hadoop.EnclosedJsonInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false,

numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:com.esri.hadoop.hive.serde.JsonSerde, parameters:{serialization.format=1}),

bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}),

storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, EXTERNAL=TRUE, transient_lastDdlTime=1459342519, COLUMN_STATS_ACCURATE=true,

totalSize=6665990138, numRows=0, rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) Time taken: 0.208 seconds, Fetched: 69 row(s)

hive> select count(OBJECTID) from building; Query ID = hadoop_20160411131717_73f71c12-353a-4119-8ab3-913d978a2dc1 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1460354375516_0007, Tracking URL = http://jiogis-cluster-jiogis-master-001:8088/proxy/application_1460354375516_0007/ Kill Command = /opt/hadoop/bin/hadoop job -kill job_1460354375516_0007 Hadoop job information for Stage-1: number of mappers: 25; number of reducers: 1 2016-04-11 13:17:57,568 Stage-1 map = 0%, reduce = 0% 2016-04-11 13:18:51,298 Stage-1 map = 88%, reduce = 100%, Cumulative CPU 33.68 sec 2016-04-11 13:18:52,323 Stage-1 map = 100%, reduce = 100% MapReduce Total cumulative CPU time: 33 seconds 680 msec Ended Job = job_1460354375516_0007 with errors Error during job, obtaining debugging information... Examining task ID: task_1460354375516_0007_m_000000 (and more) from job job_1460354375516_0007 Examining task ID: task_1460354375516_0007_m_000003 (and more) from job job_1460354375516_0007 Examining task ID: task_1460354375516_0007_m_000001 (and more) from job job_1460354375516_0007 Examining task ID: task_1460354375516_0007_m_000008 (and more) from job job_1460354375516_0007 Examining task ID: task_1460354375516_0007_m_000015 (and more) from job job_1460354375516_0007

Task with the most failures(4):

Task ID: task_1460354375516_0007_m_000000

URL:

http://jiogis-cluster-jiogis-master-001:8088/taskdetails.jsp?jobid=job_1460354375516_0007&tipid=task_1460354375516_0007_m_000000

Diagnostic Messages for this Task: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"attributes":

{"OBJECTID":40712,"SUBLOCALITY":"Shakti Nagar

2","CITYNAME":"Bhuj","STATENAME":"Gujarat","TAG":null,"PINCODE":370427,"LBU2":"NEW","VINTAGE":2011,"BOI":null,"BUILDING_RJID":"BHUJBD0031982","LOCALI

TYNAME":"Sanskar

Nagar","SUBSUBLOCALITY":null,"CITYCODE":"BHUJ","LOCALITYCODE":"SNKR","LOCALITY_RJID":"LOY71336","DATASOURCE":null,"FTTXRFS":null,"BLCMSTATUS":null,"T

ALUKCODE":"BHUJ","TALUKNAME":"Bhuj","DISTRICTCODE":"BHUJ","DISTRICTNAME":"Kachchh","BOICATEGORY":null,"NEIGHBOURHOODCODE":null,"JIOCENTERNAME":"Bhuj"

,"VILLAGENAME":"Mirjhapar (CT)","VILLAGE_RJID":"VIE78276","JIOCENTERCODE":"JC01","GLOBALID_1":"{87ACB15B-BB59-42FB-8737-

5111B9A239B6}","JIOCENTER_RJID":"GJ-BHUJ-JC01-0275","JIOCENTER_SAP_ID":"I-GJ-BHUJ-JCO-

0001","SHAPE_Length":35.082851836058126,"SHAPE_Area":66.70308817988206},"geometry":{"curveRings":[[[-1293826.0616008043,2638881.98328707],[-

1293835.0307057127,2638881.8490332216],[-1293835.104782246,2638888.9112596065],[-1293824.5208404362,2638889.0695212036],[-

1293824.4993598238,2638887.027283214],[-1293825.616667755,2638887.010383025],{"c":[[-1293826.1089845225,2638886.5079577304],[-

1293825.966182138,2638886.8604469104]]},[-1293826.0616008043,2638881.98328707]]]}} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"attributes":

{"OBJECTID":40712,"SUBLOCALITY":"Shakti Nagar

2","CITYNAME":"Bhuj","STATENAME":"Gujarat","TAG":null,"PINCODE":370427,"LBU2":"NEW","VINTAGE":2011,"BOI":null,"BUILDING_RJID":"BHUJBD0031982","LOCALI

TYNAME":"Sanskar

Nagar","SUBSUBLOCALITY":null,"CITYCODE":"BHUJ","LOCALITYCODE":"SNKR","LOCALITY_RJID":"LOY71336","DATASOURCE":null,"FTTXRFS":null,"BLCMSTATUS":null,"T

ALUKCODE":"BHUJ","TALUKNAME":"Bhuj","DISTRICTCODE":"BHUJ","DISTRICTNAME":"Kachchh","BOICATEGORY":null,"NEIGHBOURHOODCODE":null,"JIOCENTERNAME":"Bhuj"

,"VILLAGENAME":"Mirjhapar (CT)","VILLAGE_RJID":"VIE78276","JIOCENTERCODE":"JC01","GLOBALID_1":"{87ACB15B-BB59-42FB-8737-

5111B9A239B6}","JIOCENTER_RJID":"GJ-BHUJ-JC01-0275","JIOCENTER_SAP_ID":"I-GJ-BHUJ-JCO-

0001","SHAPE_Length":35.082851836058126,"SHAPE_Area":66.70308817988206},"geometry":{"curveRings":[[[-1293826.0616008043,2638881.98328707],[-

1293835.0307057127,2638881.8490332216],[-1293835.104782246,2638888.9112596065],[-1293824.5208404362,2638889.0695212036],[-

1293824.4993598238,2638887.027283214],[-1293825.616667755,2638887.010383025],{"c":[[-1293826.1089845225,2638886.5079577304],[-

1293825.966182138,2638886.8604469104]]},[-1293826.0616008043,2638881.98328707]]]}} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 8 more Caused by: java.lang.NullPointerException at com.esri.hadoop.hive.GeometryUtils.serialize(Unknown Source) at com.esri.hadoop.hive.GeometryUtils.access$000(Unknown Source) at com.esri.hadoop.hive.GeometryUtils$CachedGeometryBytesWritable.(Unknown Source) at com.esri.hadoop.hive.GeometryUtils.geometryToEsriShapeBytesWritable(Unknown Source) at com.esri.hadoop.hive.serde.JsonSerde.deserialize(Unknown Source) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:136) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:100) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:492) ... 9 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 25 Reduce: 1 Cumulative CPU: 33.68 sec HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 33 seconds 680 msec

climbage commented 8 years ago

I see in the JSON for the failed record that you have a geometry with curves. Unfortunately, the Java geometry library only supports simple feature types and not curves.

"geometry":{"curveRings":[[[-1293826.0616008043,2638881.98328707],[-1293835.0307057127,2638881.8490332216],[-1293835.104782246,2638888.9112596065],[-1293824.5208404362,2638889.0695212036],[-1293824.4993598238,2638887.027283214],[-1293825.616667755,2638887.010383025],{"c":[[-1293826.1089845225,2638886.5079577304],[-1293825.966182138,2638886.8604469104]]},[-1293826.0616008043,2638881.98328707]]]}}