Open dvonck opened 9 years ago
Interesting. We will try to reproduce this. In the meantime, can you disable vectorization to try and get around the error?
set hive.vectorized.execution.enabled = false;
This may affect the performance of the queries.
Hi Michael
Using set hive.vectorized.execution.enabled = false; and set hive.default.fileformat=TextFile; made the queries work. Having a look at the source code it looks like ORC does not know how to work with the spatial types in columns.
630 private ColumnVector More ...allocateColumnVector(String type, int defaultSize) {
631 if (type.equalsIgnoreCase("double")) {
632 return new DoubleColumnVector(defaultSize);
633 } else if (VectorizationContext.isStringFamily(type)) {
634 return new BytesColumnVector(defaultSize);
635 } else if (VectorizationContext.decimalTypePattern.matcher(type).matches()){
636 int [] precisionScale = getScalePrecisionFromDecimalType(type);
637 return new DecimalColumnVector(defaultSize, precisionScale[0], precisionScale[1]);
638 } else if (type.equalsIgnoreCase("long") ||
639 type.equalsIgnoreCase("date") ||
640 type.equalsIgnoreCase("timestamp")) {
641 return new LongColumnVector(defaultSize);
642 } else {
643 throw new Error("Cannot allocate vector column for " + type);
644 }
645 }
646
Thank you very much for your help.
You can close this issue.
Regards
Derck
From: Michael Park [mailto:notifications@github.com] Sent: 23 June 2015 04:37 PM To: Esri/spatial-framework-for-hadoop Cc: Derck Vonck Subject: Re: [spatial-framework-for-hadoop] Using the spatial framework for hadoop with data stored in ORC files (#85)
Interesting. We will try to reproduce this. In the meantime, can you disable vectorization to try and get around the error?
set hive.vectorized.execution.enabled = false;
This may affect the performance of the queries.
— Reply to this email directly or view it on GitHubhttps://github.com/Esri/spatial-framework-for-hadoop/issues/85#issuecomment-114528200.
Hey we are able to run spatial data with ORC Files.
I ran to the same problem as you. After Some research I figured that TEZ Engine uses Vectorization which does not support Binary Datatype. When we compute ST_Point or ST_Polygon the result is binary data. So just disabling vectorization for this step solves your problem
I don't see this method that is called out on master:
do we think this is still a problem on hive-master?
It looks like it was changed in this commit:
https://github.com/apache/hive/commit/30f20e992e05754efc4b984030b01f0184e0359d
then the code in
at some point was updated to include binary support. or it appears that way.
Good Afternoon,
The ORC format allows for the efficient storage and retrieval of big data files. For more details see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC.
We have installed a Hadoop Cluster based on the Hortonworks Data Platform 2.2.6.0-2800.
When we work with csv files in hive we do not have any problems . When we use the ORC file format we get the following problems.
ORC Problem
[hive@srv-hc10 ~]$ hive
hive> add jar esri-geometry-api-1.2.1.jar spatial-sdk-hive-1.0.3-SNAPSHOT.jar spatial-sdk-json-1.0.3-SNAPSHOT.jar; Added [esri-geometry-api-1.2.1.jar, spatial-sdk-hive-1.0.3-SNAPSHOT.jar, spatial-sdk-json-1.0.3-SNAPSHOT.jar] to class path Added resources: [esri-geometry-api-1.2.1.jar, spatial-sdk-hive-1.0.3-SNAPSHOT.jar, spatial-sdk-json-1.0.3-SNAPSHOT.jar] hive> create temporary function ST_Bin as 'com.esri.hadoop.hive.ST_Bin'; OK Time taken: 0.636 seconds hive> create temporary function ST_BinEnvelope as 'com.esri.hadoop.hive.ST_BinEnvelope'; OK Time taken: 0.014 seconds
hive> describe formatted xxxxxxx.events_orc; OK
col_name data_type comment
vehicle_id int ignition smallint event_ts bigint event_description string longitude double latitude double altitude string speed smallint bearing smallint linear_g double lateral_g double trip_no int
Detailed Table Information
Database: xxxxxxx Owner: root CreateTime: Thu Jun 18 22:41:42 SAST 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://srv-hcm01.esri-southafrica.com:8020/apps/hive/warehouse/xxxxxxx.db/events_orc Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE false auto.purge true comment xxxxxxx analysis table last_modified_by root last_modified_time 1434727038 numFiles 62 numRows -1 orc.compress SNAPPY rawDataSize -1 totalSize 1954173667 transient_lastDdlTime 1434727038
Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: 62 Bucket Columns: [vehicle_id] Sort Columns: [Order(col:event_ts, order:1)] Storage Desc Params: serialization.format 1 Time taken: 1.135 seconds, Fetched: 47 row(s) hive> select ST_Bin(0.001, ST_Point(longitude, latitude)) as binvalue, count(*) as freq
Status: Running (Executing on YARN cluster with App id application_1434395264469_0091)
Map 1 FAILED 68 0 0 68 153 67
Reducer 2 KILLED 8 0 0 8 0 8
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 23.79 s
Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1434395264469_0091_1_00, diagnostics=[Task failed, taskId=task_1434395264469_0091_1_00_000011, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Error: Cannot allocate vector column for None at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.allocateColumnVector(VectorizedRowBatchCtx.java:643) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addScratchColumnsToBatch(VectorizedRowBatchCtx.java:606) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.createVectorizedRowBatch(VectorizedRowBatchCtx.java:339) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:109) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.createValue(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:58) at org.apache.hadoop.hive.ql.io.HiveRecordReader.createValue(HiveRecordReader.java:33) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.createValue(TezGroupedSplitsInputFormat.java:141) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:150) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:80) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:609) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:588) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:140) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:361) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:134) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162) ... 13 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1434395264469_0091_1_00 [Map 1] killed/failed due to:null] Vertex killed, vertexName=Reducer 2, vertexId=vertex_1434395264469_0091_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1434395264469_0091_1_01 [Reducer 2] killed/failed due to:null] DAG failed due to vertex failure. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask hive>
Could you please investigate if this is viable.
Regards
Derck