Closed asfimport closed 9 years ago
Sathish: This patch fixes this issue,Since this feature we want to use in the next release of Hive. Requesting someone to look into this patch changes and merge to the main branch.
Sathish: Can someone look into this issue and provide any comments or suggestions for this fix. Provided the patch and waiting for this patch to be merged to the main branch as this feature of Hive we want use in our next release.
Szehon Ho: Hi Satish, can you please fix the formatting? Indents are 2 spaces (hive code is like that), and put a space after the comma, etc.
Otherwise it looks good to me. But granted, I'm not an expert of parquet schema, so my only question is that it compatible with other tools? + [~jcoffey]
, @rdblue for comments (if any).
Overall: -1 no tests executed
Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663651/HIVE-7850.patch
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/465/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/465/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-465/
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-465/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'hbase-handler/src/test/results/positive/hbase_custom_key3.q.out'
Reverted 'hbase-handler/src/test/results/positive/hbase_ppd_key_range.q.out'
Reverted 'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java'
Reverted 'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java'
Reverted 'hbase-handler/src/test/queries/positive/hbase_ppd_key_range.q'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyPredicateDecomposer.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStoragePredicateHandler.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBetween.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target hbase-handler/src/test/results/positive/hbase_ppd_or.q.out hbase-handler/src/test/queries/positive/hbase_ppd_or.q hbase-handler/src/java/org/apache/hadoop/hive/hbase/OrPredicateHBaseKeyFactory.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/predicate hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanFactory.java testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1619922.
At revision 1619922.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
This message is automatically generated.
ATTACHMENT ID: 12663651
Overall: -1 no tests executed
Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664109/HIVE-7850.1.patch
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-487/
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-487/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
Reverted 'common/src/java/org/apache/hadoop/hive/conf/Validator.java'
Reverted 'service/src/java/org/apache/hive/service/cli/OperationState.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSession.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/SessionManager.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/Operation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/OperationManager.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1620279.
At revision 1620279.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
This message is automatically generated.
ATTACHMENT ID: 12664109
Ryan Blue / @rdblue: Looking at just the changes to the schema conversion, I'm not sure why the change to the list structure was done. Previously, lists were converted to:
// array<string> name
optional group name (LIST) {
repeated group bag {
optional string array_element;
}
}
This allowed the list itself to be null and allowed null elements. This patch changes the conversion to:
// array<string> name
optional group name (LIST) {
repeated string array_element;
}
This requires that the elements are non-null. Was this on purpose? The first one looks more correct to me, but the second would be if nulls aren't allowed in Hive lists. In addition, the HiveSchemaConverter#listWrapper method and ParquetHiveSerDe.ARRAY static field are no longer used but not removed.
The other change to schema conversion tests the Repetition and calls Types.required
or Types.optional
. This should instead call Types.primitive(type, repetition)
to pass the repetition to the Types
API. That way, Repetition.REPEATED
is supported also, which is a bug in the current patch.
Ryan Blue / @rdblue:
It looks like ArrayWritableGroupConverter
is only used for maps and arrays, but the array handling was added mostly in this patch. Given that most of the methods check isMap
and have completely different implementations for map and array, it makes more sense to separate this into two classes, ArrayGroupConverter
and MapGroupConverter
. Then HiveSchemaConverter
should choose the correct one based on the OriginalType
annotation. If there is no original type annotation, but the type is repeated, it should use an ArrayGroupConverter
.
Sathish: Hi Ryan, I agree that the Hive should support lists with null elements. But can you give some idea on the cases where the no null lists are being generated, Whenever the parquet files are being generated from the Avro files most of the files are having the array schema as below
optional group name (LIST) {
repeated string array_element;
}
Do you provide any suggestions on how best we can support for both kind of arrays. This patch only fix the arrays with no null entries.
Sathish: Used Types.primitive(type,repetition) as suggested by ryan and also working on separating Maps and Arrays Group converters into two separate classes. I will update my patch once done with my changes.
Regarding the LIST structure can you give your suggestions on how we can support for both NULL elements list and Normal non null elements lists in Hive. I am of the opinion to build a separate structure for NULL elements list like (NULL_LIST) as shown below,
// array<string> name
optional group name (NULL_LIST) {
repeated group bag {
optional string array_element;
}
}
Can you provide your suggestions on this.
Sathish: New patch submitted based on comments and suggestions from ryan.
Ryan Blue / @rdblue: The array fix is something we need to do on the parquet-avro module. We know it's not allowing null elements, but Hive was so that's why I mentioned it. Whether or not a null element is allowed depends on the repetition of the "array_element" field. If it is repeated, then it doesn't allow null. But the element inside the LIST has to be repeated, so to get a nullable type you have to create a new group, "array_element" with one element that is optional (and then name the repeated type "bag"). The easy way to support non-null and nullable array elements is to switch the "array_element" field between required and optional. But, I don't think we need to support non-null array elements.
If Hive has a array<string>
type, are the element nullable? If they are, then we don't need to support the other case.
Overall: -1 no tests executed
Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664368/HIVE-7850.2.patch
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/506/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/506/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-506/
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-506/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
Reverted 'service/src/java/org/apache/hive/service/cli/ICLIService.java'
Reverted 'service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java'
Reverted 'service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java'
Reverted 'service/src/java/org/apache/hive/service/cli/CLIServiceClient.java'
Reverted 'service/src/java/org/apache/hive/service/cli/CLIService.java'
Reverted 'service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSession.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/SessionManager.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/Operation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/OperationManager.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java'
Reverted 'service/src/gen/thrift/gen-py/TCLIService/ttypes.py'
Reverted 'service/src/gen/thrift/gen-cpp/TCLIService_types.cpp'
Reverted 'service/src/gen/thrift/gen-cpp/TCLIService_types.h'
Reverted 'service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb'
Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java'
Reverted 'service/if/TCLIService.thrift'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen service/target service/src/test/org/apache/hive/service/cli/operation service/src/java/org/apache/hive/service/cli/FetchType.java service/src/java/org/apache/hive/service/cli/operation/OperationLog.java service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
U ql/src/test/queries/clientpositive/optimize_nullscan.q
U ql/src/test/results/clientpositive/optimize_nullscan.q.out
U ql/src/test/results/clientpositive/tez/optimize_nullscan.q.out
Fetching external item into 'hcatalog/src/test/e2e/harness'
Updated external to revision 1620682.
Updated to revision 1620682.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
This message is automatically generated.
ATTACHMENT ID: 12664368
Sathish: Thanks Ryan, Based on your comments it looks like no particular changed needed in the Hive serde side for handling Non nullable arrays and Parquet Avro library needs to be fixed to convert the schema format properly. I am planning to work on fixing the parquet Avro library for properly generating the parquet files with schema understandable by the Hive, I will update my findings once my changes are done.
Sathish: Submitted my changes as a pull request to the parquet-mr branch, The details of the changes made in parquet-avro branch are shown below, https://github.com/apache/incubator-parquet-mr/pull/47
Can you review and provide suggestions on this fix.
Ryan Blue / @rdblue: This issue is fixed by HIVE-8909. That issue includes several tests that verify Hive can read existing data with arrays created by Avro and Thrift.
Daniel Haviv: Hi Ryan, Does it mean it's included in the nightly build?
Thanks Daniel
Ryan Blue / @rdblue:
[~danielil]
, which nightly build are you referring to? It would be in Hive nightly builds because it's in trunk, but this hasn't been backported to the parquet-hive module in Parquet-mr so it wouldn't be there.
Daniel Haviv: Hi, It seems like something is broken now.. I ran this query on a parquet from 0.13: 0: jdbc:hive2://hdname:10000/default> select count(*) from A1; ----------- ----------- ----------- 1 row selected (48.401 seconds)
and when I run it from 0.15 I get:
2014-11-26 02:01:54,556 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.io.IOException:
java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:312)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.
and
2014-11-26 02:01:55,579 INFO [main] org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator: 1 Close done 2014-11-26 02:01:55,579 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2014-11-26 02:01:55,579 INFO [main] org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: 11 Close done 2014-11-26 02:01:55,583 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.io.IOException: java.lang.NullPointerExcepti on at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271) ... 11 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:191) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:117) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 15 more
My parquet files were generated by Spark (I can share them if you need them for testing purposes).
Daniel
On Tue, Nov 25, 2014 at 7:47 PM, Ryan Blue (JIRA) jira@apache.org wrote:
Ryan Blue / @rdblue:
[~danielil]
, I wouldn't expect this to be fixed in your environment if you're running Hive 0.13.x. This is currently in the Hive trunk, but hasn't been released. If you were using nightly builds of Hive, then you would see the fix.
Pranav Singh: Can you please tell which version this fix is going to be included in? Cloudera's CDH 5.3 says there are lot of hive, parquet fixes - does this one got included?
This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281.
Reporter: Sathish Assignee: Ryan Blue / @rdblue
Related issues:
Original Issue Attachments:
Note: This issue was originally created as PARQUET-83. Please see the migration documentation for further details.