apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.31k stars 2.41k forks source link

[HUDI-1609] How to disable Hive JDBC and enable metastore #1679

Closed selvarajperiyasamy closed 3 years ago

selvarajperiyasamy commented 4 years ago

Team,

My spark version is 2.3.0 Scala version 2.11.8 Hive version 1.2.2

I see the below comment in Hudi code. How can I start using metastore client for hive registrations? is there a way to disable useJdbc flag?

// Support both JDBC and metastore based implementations for backwards compatiblity. Future users should // disable jdbc and depend on metastore client for all hive registrations

Below is my log. It makes hive JDBC connection and failing due to method not available error.

20/05/26 15:38:15 INFO HoodieSparkSqlWriter$: Syncing to Hive Metastore (URL: jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2)
20/05/26 15:38:15 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://oprhqanameservice], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/etc/spark2/2.6.5.179-4/0/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1153590032_1, ugi=svchdc36q@VISA.COM (auth:KERBEROS)]]]
20/05/26 15:38:15 INFO HiveConf: Found configuration file file:/etc/spark2/2.6.5.179-4/0/hive-site.xml
20/05/26 15:38:16 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from /projects/cdp/data/cdp_reporting/trr
20/05/26 15:38:16 INFO FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://oprhqanameservice], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/etc/spark2/2.6.5.179-4/0/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1153590032_1, ugi=svchdc36q@VISA.COM (auth:KERBEROS)]]]
20/05/26 15:38:16 INFO HoodieTableConfig: Loading dataset properties from /projects/cdp/data/cdp_reporting/trr/.hoodie/hoodie.properties
20/05/26 15:38:16 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE from /projects/cdp/data/cdp_reporting/trr
20/05/26 15:38:16 INFO HoodieTableMetaClient: Loading Active commit timeline for /projects/cdp/data/cdp_reporting/trr
20/05/26 15:38:16 INFO HoodieActiveTimeline: Loaded instants java.util.stream.ReferencePipeline$Head@a1fca5a
20/05/26 15:38:16 INFO HoodieHiveClient: Creating hive connection jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
20/05/26 15:38:16 INFO Utils: Supplied authorities: server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181
20/05/26 15:38:16 INFO CuratorFrameworkImpl: Starting
20/05/26 15:38:16 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-4--1, built on 08/09/2019 23:18 GMT
20/05/26 15:38:16 INFO ZooKeeper: Client environment:host.name=server4.visa.com
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.version=1.8.0_241
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_241-amd64/jre
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.class.path=/usr/hdp/current/spark2-client/conf/:/usr/hdp/current/spark2-client/jars/hk2-api-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/JavaEWAH-0.3.2.jar:/usr/hdp/current/spark2-client/jars/commons-pool-1.5.4.jar:/usr/hdp/current/spark2-client/jars/RoaringBitmap-0.5.11.jar:/usr/hdp/current/spark2-client/jars/hk2-locator-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/ST4-4.0.4.jar:/usr/hdp/current/spark2-client/jars/compress-lzf-1.0.3.jar:/usr/hdp/current/spark2-client/jars/activation-1.1.1.jar:/usr/hdp/current/spark2-client/jars/core-1.1.2.jar:/usr/hdp/current/spark2-client/jars/aircompressor-0.8.jar:/usr/hdp/current/spark2-client/jars/hk2-utils-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/antlr-2.7.7.jar:/usr/hdp/current/spark2-client/jars/curator-client-2.7.1.jar:/usr/hdp/current/spark2-client/jars/antlr-runtime-3.4.jar:/usr/hdp/current/spark2-client/jars/curator-framework-2.7.1.jar:/usr/hdp/current/spark2-client/jars/antlr4-runtime-4.7.jar:/usr/hdp/current/spark2-client/jars/ivy-2.4.0.jar:/usr/hdp/current/spark2-client/jars/aopalliance-1.0.jar:/usr/hdp/current/spark2-client/jars/commons-io-2.4.jar:/usr/hdp/current/spark2-client/jars/janino-3.0.8.jar:/usr/hdp/current/spark2-client/jars/aopalliance-repackaged-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/commons-collections-3.2.2.jar:/usr/hdp/current/spark2-client/jars/apache-log4j-extras-1.2.17.jar:/usr/hdp/current/spark2-client/jars/curator-recipes-2.7.1.jar:/usr/hdp/current/spark2-client/jars/apacheds-i18n-2.0.0-M15.jar:/usr/hdp/current/spark2-client/jars/commons-cli-1.2.jar:/usr/hdp/current/spark2-client/jars/javax.inject-1.jar:/usr/hdp/current/spark2-client/jars/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/current/spark2-client/jars/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/current/spark2-client/jars/api-asn1-api-1.0.0-M20.jar:/usr/hdp/current/spark2-client/jars/datanucleus-core-3.2.10.jar:/usr/hdp/current/spark2-client/jars/api-util-1.0.0-M20.jar:/usr/hdp/current/spark2-client/jars/datanucleus-rdbms-3.2.9.jar:/usr/hdp/current/spark2-client/jars/arpack_combined_all-0.1.jar:/usr/hdp/current/spark2-client/jars/derby-10.12.1.1.jar:/usr/hdp/current/spark2-client/jars/arrow-format-0.8.0.jar:/usr/hdp/current/spark2-client/jars/eigenbase-properties-1.1.5.jar:/usr/hdp/current/spark2-client/jars/arrow-memory-0.8.0.jar:/usr/hdp/current/spark2-client/jars/flatbuffers-1.2.0-3f79e055.jar:/usr/hdp/current/spark2-client/jars/arrow-vector-0.8.0.jar:/usr/hdp/current/spark2-client/jars/hppc-0.7.2.jar:/usr/hdp/current/spark2-client/jars/avro-1.7.7.jar:/usr/hdp/current/spark2-client/jars/httpclient-4.5.2.jar:/usr/hdp/current/spark2-client/jars/avro-ipc-1.7.7.jar:/usr/hdp/current/spark2-client/jars/commons-compiler-3.0.8.jar:/usr/hdp/current/spark2-client/jars/avro-mapred-1.7.7-hadoop2.jar:/usr/hdp/current/spark2-client/jars/commons-compress-1.4.1.jar:/usr/hdp/current/spark2-client/jars/aws-java-sdk-core-1.10.6.jar:/usr/hdp/current/spark2-client/jars/guava-14.0.1.jar:/usr/hdp/current/spark2-client/jars/aws-java-sdk-kms-1.10.6.jar:/usr/hdp/current/spark2-client/jars/gson-2.2.4.jar:/usr/hdp/current/spark2-client/jars/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/current/spark2-client/jars/commons-configuration-1.6.jar:/usr/hdp/current/spark2-client/jars/azure-data-lake-store-sdk-2.1.4.jar:/usr/hdp/current/spark2-client/jars/commons-lang-2.6.jar:/usr/hdp/current/spark2-client/jars/jpam-1.1.jar:/usr/hdp/current/spark2-client/jars/azure-keyvault-core-0.8.0.jar:/usr/hdp/current/spark2-client/jars/guice-3.0.jar:/usr/hdp/current/spark2-client/jars/azure-storage-5.4.0.jar:/usr/hdp/current/spark2-client/jars/httpcore-4.4.4.jar:/usr/hdp/current/spark2-client/jars/base64-2.3.8.jar:/usr/hdp/current/spark2-client/jars/guice-servlet-3.0.jar:/usr/hdp/current/spark2-client/jars/bcprov-jdk15on-1.58.jar:/usr/hdp/current/spark2-client/jars/hadoop-aws-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/bonecp-0.8.0.RELEASE.jar:/usr/hdp/current/spark2-client/jars/commons-lang3-3.5.jar:/usr/hdp/current/spark2-client/jars/breeze-macros_2.11-0.13.2.jar:/usr/hdp/current/spark2-client/jars/hadoop-auth-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/breeze_2.11-0.13.2.jar:/usr/hdp/current/spark2-client/jars/commons-codec-1.10.jar:/usr/hdp/current/spark2-client/jars/jta-1.1.jar:/usr/hdp/current/spark2-client/jars/calcite-avatica-1.2.0-incubating.jar:/usr/hdp/current/spark2-client/jars/commons-logging-1.1.3.jar:/usr/hdp/current/spark2-client/jars/calcite-core-1.2.0-incubating.jar:/usr/hdp/current/spark2-client/jars/commons-math3-3.4.1.jar:/usr/hdp/current/spark2-client/jars/calcite-linq4j-1.2.0-incubating.jar:/usr/hdp/current/spark2-client/jars/hadoop-azure-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/chill-java-0.8.4.jar:/usr/hdp/current/spark2-client/jars/hadoop-client-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/chill_2.11-0.8.4.jar:/usr/hdp/current/spark2-client/jars/hadoop-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-beanutils-1.7.0.jar:/usr/hdp/current/spark2-client/jars/gcs-connector-1.8.1.2.6.5.179-4-shaded.jar:/usr/hdp/current/spark2-client/jars/commons-beanutils-core-1.8.0.jar:/usr/hdp/current/spark2-client/jars/commons-crypto-1.0.0.jar:/usr/hdp/current/spark2-client/jars/hadoop-hdfs-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-dbcp-1.4.jar:/usr/hdp/current/spark2-client/jars/hive-beeline-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-digester-1.8.jar:/usr/hdp/current/spark2-client/jars/hive-cli-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/commons-httpclient-3.1.jar:/usr/hdp/current/spark2-client/jars/jackson-core-2.6.7.jar:/usr/hdp/current/spark2-client/jars/commons-net-2.2.jar:/usr/hdp/current/spark2-client/jars/javolution-5.5.1.jar:/usr/hdp/current/spark2-client/jars/jersey-server-2.22.2.jar:/usr/hdp/current/spark2-client/jars/xz-1.0.jar:/usr/hdp/current/spark2-client/jars/hadoop-annotations-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jets3t-0.9.4.jar:/usr/hdp/current/spark2-client/jars/okhttp-2.7.5.jar:/usr/hdp/current/spark2-client/jars/hadoop-azure-datalake-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-container-servlet-2.22.2.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-app-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jdo-api-3.0.1.jar:/usr/hdp/current/spark2-client/jars/libfb303-0.9.3.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-container-servlet-core-2.22.2.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-core-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-client-2.22.2.jar:/usr/hdp/current/spark2-client/jars/netty-3.9.9.Final.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-jobclient-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-common-2.22.2.jar:/usr/hdp/current/spark2-client/jars/netty-all-4.1.17.Final.jar:/usr/hdp/current/spark2-client/jars/hadoop-mapreduce-client-shuffle-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jetty-6.1.26.hwx.jar:/usr/hdp/current/spark2-client/jars/okio-1.6.0.jar:/usr/hdp/current/spark2-client/jars/hadoop-openstack-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jetty-sslengine-6.1.26.hwx.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-api-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jetty-util-6.1.26.hwx.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-client-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jline-2.12.1.jar:/usr/hdp/current/spark2-client/jars/opencsv-2.3.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/joda-time-2.9.3.jar:/usr/hdp/current/spark2-client/jars/oro-2.0.8.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-registry-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-guava-2.22.2.jar:/usr/hdp/current/spark2-client/jars/paranamer-2.8.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-server-common-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jersey-media-jaxb-2.22.2.jar:/usr/hdp/current/spark2-client/jars/py4j-0.10.6.jar:/usr/hdp/current/spark2-client/jars/hadoop-yarn-server-web-proxy-2.7.3.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/json4s-ast_2.11-3.2.11.jar:/usr/hdp/current/spark2-client/jars/hive-exec-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/json4s-core_2.11-3.2.11.jar:/usr/hdp/current/spark2-client/jars/hive-jdbc-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/jodd-core-3.5.2.jar:/usr/hdp/current/spark2-client/jars/pyrolite-4.13.jar:/usr/hdp/current/spark2-client/jars/hive-metastore-1.21.2.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/json4s-jackson_2.11-3.2.11.jar:/usr/hdp/current/spark2-client/jars/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark2-client/jars/jsp-api-2.1.jar:/usr/hdp/current/spark2-client/jars/jackson-annotations-2.6.7.jar:/usr/hdp/current/spark2-client/jars/libthrift-0.9.3.jar:/usr/hdp/current/spark2-client/jars/jackson-core-asl-1.9.13.jar:/usr/hdp/current/spark2-client/jars/jsr305-1.3.9.jar:/usr/hdp/current/spark2-client/jars/jackson-databind-2.6.7.1.jar:/usr/hdp/current/spark2-client/jars/jtransforms-2.4.0.jar:/usr/hdp/current/spark2-client/jars/jackson-dataformat-cbor-2.6.7.jar:/usr/hdp/current/spark2-client/jars/log4j-1.2.17.jar:/usr/hdp/current/spark2-client/jars/jackson-jaxrs-1.9.13.jar:/usr/hdp/current/spark2-client/jars/jul-to-slf4j-1.7.16.jar:/usr/hdp/current/spark2-client/jars/jackson-mapper-asl-1.9.13.jar:/usr/hdp/current/spark2-client/jars/kryo-shaded-3.0.3.jar:/usr/hdp/current/spark2-client/jars/jackson-module-paranamer-2.7.9.jar:/usr/hdp/current/spark2-client/jars/json-smart-1.3.1.jar:/usr/hdp/current/spark2-client/jars/scalap-2.11.8.jar:/usr/hdp/current/spark2-client/jars/jackson-module-scala_2.11-2.6.7.1.jar:/usr/hdp/current/spark2-client/jars/lz4-java-1.4.0.jar:/usr/hdp/current/spark2-client/jars/jackson-xc-1.9.13.jar:/usr/hdp/current/spark2-client/jars/machinist_2.11-0.6.1.jar:/usr/hdp/current/spark2-client/jars/java-xmlbuilder-1.1.jar:/usr/hdp/current/spark2-client/jars/macro-compat_2.11-1.1.1.jar:/usr/hdp/current/spark2-client/jars/javassist-3.18.1-GA.jar:/usr/hdp/current/spark2-client/jars/leveldbjni-all-1.8.jar:/usr/hdp/current/spark2-client/jars/javax.annotation-api-1.2.jar:/usr/hdp/current/spark2-client/jars/metrics-core-3.1.5.jar:/usr/hdp/current/spark2-client/jars/javax.inject-2.4.0-b34.jar:/usr/hdp/current/spark2-client/jars/metrics-graphite-3.1.5.jar:/usr/hdp/current/spark2-client/jars/javax.servlet-api-3.1.0.jar:/usr/hdp/current/spark2-client/jars/metrics-json-3.1.5.jar:/usr/hdp/current/spark2-client/jars/javax.ws.rs-api-2.0.1.jar:/usr/hdp/current/spark2-client/jars/nimbus-jose-jwt-4.41.1.jar:/usr/hdp/current/spark2-client/jars/jaxb-api-2.2.2.jar:/usr/hdp/current/spark2-client/jars/metrics-jvm-3.1.5.jar:/usr/hdp/current/spark2-client/jars/jcip-annotations-1.0-1.jar:/usr/hdp/current/spark2-client/jars/minlog-1.3.0.jar:/usr/hdp/current/spark2-client/jars/jcl-over-slf4j-1.7.16.jar:/usr/hdp/current/spark2-client/jars/parquet-column-1.8.2.jar:/usr/hdp/current/spark2-client/jars/objenesis-2.1.jar:/usr/hdp/current/spark2-client/jars/spire_2.11-0.13.0.jar:/usr/hdp/current/spark2-client/jars/orc-core-1.4.3.2.6.5.179-4-nohive.jar:/usr/hdp/current/spark2-client/jars/stax-api-1.0-2.jar:/usr/hdp/current/spark2-client/jars/orc-mapreduce-1.4.3.2.6.5.179-4-nohive.jar:/usr/hdp/current/spark2-client/jars/osgi-resource-locator-1.0.1.jar:/usr/hdp/current/spark2-client/jars/parquet-common-1.8.2.jar:/usr/hdp/current/spark2-client/jars/parquet-encoding-1.8.2.jar:/usr/hdp/current/spark2-client/jars/parquet-format-2.3.1.jar:/usr/hdp/current/spark2-client/jars/parquet-hadoop-1.8.2.jar:/usr/hdp/current/spark2-client/jars/parquet-hadoop-bundle-1.6.0.jar:/usr/hdp/current/spark2-client/jars/parquet-jackson-1.8.2.jar:/usr/hdp/current/spark2-client/jars/protobuf-java-2.5.0.jar:/usr/hdp/current/spark2-client/jars/scala-compiler-2.11.8.jar:/usr/hdp/current/spark2-client/jars/scala-library-2.11.8.jar:/usr/hdp/current/spark2-client/jars/stax-api-1.0.1.jar:/usr/hdp/current/spark2-client/jars/scala-parser-combinators_2.11-1.0.4.jar:/usr/hdp/current/spark2-client/jars/scala-reflect-2.11.8.jar:/usr/hdp/current/spark2-client/jars/scala-xml_2.11-1.0.5.jar:/usr/hdp/current/spark2-client/jars/shapeless_2.11-2.3.2.jar:/usr/hdp/current/spark2-client/jars/slf4j-api-1.7.16.jar:/usr/hdp/current/spark2-client/jars/slf4j-log4j12-1.7.16.jar:/usr/hdp/current/spark2-client/jars/snappy-0.2.jar:/usr/hdp/current/spark2-client/jars/snappy-java-1.1.2.6.jar:/usr/hdp/current/spark2-client/jars/stream-2.7.0.jar:/usr/hdp/current/spark2-client/jars/spark-catalyst_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/stringtemplate-3.2.1.jar:/usr/hdp/current/spark2-client/jars/spark-cloud_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/super-csv-2.2.0.jar:/usr/hdp/current/spark2-client/jars/spark-core_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/univocity-parsers-2.5.9.jar:/usr/hdp/current/spark2-client/jars/spark-graphx_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-unsafe_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-hadoop-cloud_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-tags_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-hive-thriftserver_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/validation-api-1.1.0.Final.jar:/usr/hdp/current/spark2-client/jars/spark-hive_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/xbean-asm5-shaded-4.4.jar:/usr/hdp/current/spark2-client/jars/spark-kvstore_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/xercesImpl-2.9.1.jar:/usr/hdp/current/spark2-client/jars/spark-launcher_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-mllib-local_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/xmlenc-0.52.jar:/usr/hdp/current/spark2-client/jars/spark-mllib_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-yarn_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-network-common_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spire-macros_2.11-0.13.0.jar:/usr/hdp/current/spark2-client/jars/spark-network-shuffle_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/zookeeper-3.4.6.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-repl_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/zstd-jni-1.3.2-2.jar:/usr/hdp/current/spark2-client/jars/spark-sketch_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-sql_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/current/spark2-client/jars/spark-streaming_2.11-2.3.0.2.6.5.179-4.jar:/usr/hdp/2.6.5.179-4/hadoop/conf/
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.library.path=:/export/home/sobla/oracle_client/instantclient_19_5:/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
20/05/26 15:38:16 INFO ZooKeeper: Client environment:java.compiler=<NA>
20/05/26 15:38:16 INFO ZooKeeper: Client environment:os.name=Linux
20/05/26 15:38:16 INFO ZooKeeper: Client environment:os.arch=amd64
20/05/26 15:38:16 INFO ZooKeeper: Client environment:os.version=3.10.0-1062.9.1.el7.x86_64
20/05/26 15:38:16 INFO ZooKeeper: Client environment:user.name=svchdc36q
20/05/26 15:38:16 INFO ZooKeeper: Client environment:user.home=/home/svchdc36q
20/05/26 15:38:16 INFO ZooKeeper: Client environment:user.dir=/home/svchdc36q
20/05/26 15:38:16 INFO ZooKeeper: Initiating client connection, connectString=server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@4ed31bc9
20/05/26 15:38:16 INFO ClientCnxn: Opening socket connection to server server2.visa.com/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error)
20/05/26 15:38:16 INFO ClientCnxn: Socket connection established, initiating session, client: /x.x.x.x:36938, server: server2.visa.com/x.x.x.x:2181
20/05/26 15:38:16 INFO ClientCnxn: Session establishment complete on server server2.visa.com/x.x.x.x:2181, sessionid = 0x27234630fb51f5b, negotiated timeout = 40000
20/05/26 15:38:16 INFO ConnectionStateManager: State change: CONNECTED
20/05/26 15:38:17 INFO ZooKeeper: Session: 0x27234630fb51f5b closed
20/05/26 15:38:17 INFO ClientCnxn: EventThread shut down
20/05/26 15:38:17 INFO Utils: Resolved authority: server2.visa.com:10000
20/05/26 15:38:17 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://server2.visa.com:10000/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
20/05/26 15:38:19 INFO HoodieHiveClient: Successfully established Hive connection to  jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
20/05/26 15:38:19 INFO metastore: Trying to connect to metastore with URI thrift://server2.visa.com:9083
20/05/26 15:38:19 INFO metastore: Opened a connection to metastore, current connections: 1
20/05/26 15:38:19 INFO metastore: Connected to metastore.
20/05/26 15:38:19 INFO HiveSyncTool: Trying to sync hoodie table trr with base path /projects/cdp/data/cdp_reporting/trr of type COPY_ON_WRITE
20/05/26 15:38:19 ERROR DBUtil$: [App] *********************** Exception occurred in baseTableWrite for trr : Failed to check if table exists trr
org.apache.hudi.hive.HoodieHiveSyncException: Failed to check if table exists trr
    at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:459)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:91)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67)
    at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
    at com.cybs.cdp.reporting.trr.DBUtil$.transactionTableWrite(DBUtil.scala:62)
    at com.cybs.cdp.reporting.trr.TRREngine$.startEngine(TRREngine.scala:45)
    at com.cybs.cdp.reporting.trr.TRREngine$.main(TRREngine.scala:23)
    at com.cybs.cdp.reporting.trr.TRREngine.main(TRREngine.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
    at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
    at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
    at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
    at org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1443)
    at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:457)
    ... 38 more
Exception in thread "main" java.lang.Exception: Failed to check if table exists trr
    at com.cybs.cdp.reporting.trr.DBUtil$.transactionTableWrite(DBUtil.scala:69)
    at com.cybs.cdp.reporting.trr.TRREngine$.startEngine(TRREngine.scala:45)
    at com.cybs.cdp.reporting.trr.TRREngine$.main(TRREngine.scala:23)
    at com.cybs.cdp.reporting.trr.TRREngine.main(TRREngine.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:906)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)

Thanks, Selva

lamberken commented 4 years ago

hello @selvarajperiyasamy, try option --use-jdbc false

selvarajperiyasamy commented 4 years ago

@lamber-ken Do you mean something like below in data source writer ? option(“use-jdbc”,”false”)

lamberken commented 4 years ago

Hi @selvarajperiyasamy, I guess you used HiveSyncTool directly just now. For spark, use

option("hoodie.datasource.hive_sync.use_jdbc", "false")
selvarajperiyasamy commented 4 years ago

Hi @lamber-ken , is this Spark option available in 0.5.0. I tired , it didn’t work . When I checked in Hudi code base , this string not found anywhere . Attached the image .

098D1206-9EE9-45BC-A141-9A2373DAD882 59483021-79DF-410B-A570-E999AB568ED0

lamberken commented 4 years ago

Hi @selvarajperiyasamy, for hudi-0.5.0, use hoodie.datasource.hive_sync.jdbcurl

https://github.com/apache/hudi/blob/release-0.5.0/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala

vinothchandar commented 4 years ago

cc @n3nash as well who made similar changes

selvarajperiyasamy commented 4 years ago

I have already used below setting and error is still the same as mentioned in the ticket.

option(HIVE_SYNC_ENABLED_OPT_KEY,true). option(HIVE_URL_OPT_KEY,"jdbc:hive2://server1.visa.com:2181,server2.visa.com:2181,server3.visa.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"). option(HIVE_DATABASE_OPT_KEY,"cdp_reporting"). option(HIVE_TABLE_OPT_KEY,"trr"). option(HIVE_PARTITION_FIELDS_OPT_KEY,"transaction_day,transaction_hour"). option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY,classOf[MultiPartKeysValueExtractor].getName). option("hoodie.datasource.hive_sync.use_jdbc", "false").

vinothchandar commented 4 years ago

@selvarajperiyasamy Actually, the stack trace does show it going over thrift to the metastore.

Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
    at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
    at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
    at org.apache.hudi.org.apache.hadoop_hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
    at org.apache.hudi.org.apache.hadoop_hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1443)
    at org.apache.hudi.hive.HoodieHiveClient.doesTableExist(HoodieHiveClient.java:457)
    ... 38 more

This might be an issue with Hive 1.2? (we test with Hive 2.x).. I assume you are running with CDH? @bvaradar who may know about this combo more

bvaradar commented 4 years ago

@selvarajperiyasamy : This is indeed caused by the version mismatch of Hive. Enabling/Disabling jdbc will not help here. With 0.5.0, Hudi moved to Hive 2.x which was predominantly being used across various deployments. Hive 1.2.x is really old :) and Hive 1.2.x server is not compatible with Hive 2.x clients. Is it possible to upgrade the hive environment to use Hive 2.x (2.3.3 for example) ?

selvarajperiyasamy commented 4 years ago

Thanks Balaji . We are using shared cluster and may have impact for other users if I upgrade to 2.x. Will check with cluster owners and see.

However thanks for all of your support .

Thanks, Selva

On Sun, May 31, 2020 at 10:57 AM Balaji Varadarajan < notifications@github.com> wrote:

@selvarajperiyasamy https://github.com/selvarajperiyasamy : This is indeed caused by the version mismatch of Hive. Enabling/Disabling jdbc will not help here. With 0.5.0, Hudi moved to Hive 2.x which was predominantly being used across various deployments. Hive 1.2.x is really old :) and Hive 1.2.x server is not compatible with Hive 2.x clients. Is it possible to upgrade the hive environment to use Hive 2.x (2.3.3 for example) ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/hudi/issues/1679#issuecomment-636505516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOUU7IBRS4KHFCNYB7CVAATRUKLBJANCNFSM4NNQITBA .

vinothchandar commented 4 years ago

let us know how that goes. @bvaradar raised a JIRA to see what/if we can do something here.. But to add my 2c, hadoop/hive vendors are increasingly moving to Hive 3 even.. So ideally upgrading to hive 2 is a good to do things nonetheless. At least at uber, it improved hive overall from what I remember..

selvarajperiyasamy commented 4 years ago

Sure Vinoth. Thanks !

On Sun, May 31, 2020 at 11:59 AM vinoth chandar notifications@github.com wrote:

let us know how that goes. @bvaradar https://github.com/bvaradar raised a JIRA to see what/if we can do something here.. But to add my 2c, hadoop/hive vendors are increasingly moving to Hive 3 even.. So ideally upgrading to hive 2 is a good to do things nonetheless. At least at uber, it improved hive overall from what I remember..

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/hudi/issues/1679#issuecomment-636513327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOUU7ICJR2SXSN4RF73YH53RUKSI3ANCNFSM4NNQITBA .

bvaradar commented 4 years ago

@selvarajperiyasamy : Hope you were able to resolve the issue. Let us know if any help is needed.

cdmikechen commented 4 years ago

@bvaradar I've tested deltastreamer by hudi in master branch, If I set hoodie.datasource.hive_sync.use_jdbc=false and use hive driver class to create hive table, It will report error java.lang.NoClassDefFoundError: org/json/JSONEXception . I checked spark libs and hudi jars, then I found that hudi just use hive2's jars to syn hive, but hudi-utilities-bundle can not contain hive2's dependence libs.

bvaradar commented 4 years ago

@cdmikechen : Long time :)

Hudi utilities include following hive jars in shaded form

                  <include>org.apache.hive:hive-service</include>
                  <include>org.apache.hive:hive-service-rpc</include>
                  <include>org.apache.hive:hive-metastore</include>
                  <include>org.apache.hive:hive-jdbc</include>

Can you attach the whole exception you are seeing. We had a compliance reason for not including org.json classes (due to licensing issues).

vinothchandar commented 4 years ago

@cdmikechen Please let us know the whole exception.. If we can repro, ideally like to fix it before 0.6.0 goes out .

ruztbucket commented 3 years ago

Hi, i'm facing the same issue when trying to sync to hive with hoodie.datasource.hive_sync.use_jdbc=false.

This is the complete stacktrace -


20/12/01 11:47:07 INFO Client: Application report for application_1606814313006_0010 (state: RUNNING)
20/12/01 11:47:08 INFO Client: Application report for application_1606814313006_0010 (state: FINISHED)
20/12/01 11:47:08 INFO Client: 
     client token: N/A
     diagnostics: User class threw exception: java.lang.NoClassDefFoundError: org/json/JSONException
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
    at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
    at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262)
    at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
    at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
    at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359)
    at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
    at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
    at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
    at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
    at com.amazon.smt.wes.r2.SimpleJob$.main(SimpleJob.scala:54)
    at com.amazon.smt.wes.r2.SimpleJob.main(SimpleJob.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
Caused by: java.lang.ClassNotFoundException: org.json.JSONException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 58 more

     ApplicationMaster host: ip-10-0-1-32.ec2.internal
     ApplicationMaster RPC port: 35829
     queue: default
     start time: 1606822996408
     final status: FAILED
     tracking URL: http://ip-10-0-1-75.ec2.internal:20888/proxy/application_1606814313006_0010/
     user: hadoop
20/12/01 11:47:08 ERROR Client: Application diagnostics message: User class threw exception: java.lang.NoClassDefFoundError: org/json/JSONException
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)
    at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
    at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262)
    at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
    at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
    at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359)
    at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
    at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94)
    at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141)
    at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
    at com.amazon.smt.wes.r2.SimpleJob$.main(SimpleJob.scala:54)
    at com.amazon.smt.wes.r2.SimpleJob.main(SimpleJob.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:685)
Caused by: java.lang.ClassNotFoundException: org.json.JSONException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 58 more

Exception in thread "main" org.apache.spark.SparkException: Application application_1606814313006_0010 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1529)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/12/01 11:47:08 INFO ShutdownHookManager: Shutdown hook called
rakeshramakrishnan commented 3 years ago

I'm also facing the same issue as documented by @ruztbucket Am using Hudi 0.6.0

kimberlyamandalu commented 3 years ago

I am also experiencing this error on Hudi 0.6.0, EMR 5.31.0. I am setting the following property to false: hoodie.datasource.hive_sync.use_jdbc But it does not fix anything. What do we need to set as our Hive URL when we have specified for the Glue catalog to be used as the metastore in both hive-site and hive-spark-site configs...

I've tried referencing the json-1.8.jar found in /usr/lib/hive/lib/json-1.8.jar of my EMR server in my --jars parameter but that does not fix the issue either.

nsivabalan commented 3 years ago

@bvaradar : Can you please follow up on this ticket when you can.

bvaradar commented 3 years ago

@kimberlyamandalu : Sorry for the delay. This is weird. Can you check if org/json/JSONException is present in /usr/lib/hive/lib/json-1.8.jar ?

nikspatel03 commented 3 years ago

I'm also facing the same issue mentioned by @ruztbucket I'm using EMR 5.31.0 - Hudi 0.6.0 - Hive 2.3.7

nsivabalan commented 3 years ago

@bvaradar : fyi I have created a sec:critical jira on this https://issues.apache.org/jira/browse/HUDI-1609. Please reduce priority if you feel otherwise.

nsivabalan commented 3 years ago

@bvaradar : I could not reproduce w/ local docker set up. Do you have any pointers on how to go about triaging this. Also, I am running into some other issue locally which I documented in https://issues.apache.org/jira/browse/HUDI-1609.

nsivabalan commented 3 years ago

@kimberlyamandalu : in the mean time, would you mind responding to Balaji's doubts.

nsivabalan commented 3 years ago

Here is my understanding. Hudi does not use a fat jar(as the fat jar pulls in lot of unwanted jars) for hive-exec and relies on the jars to be available in the environment. Hudi does not pull this jar explicitly in any of the bundles. But this flow has been working before w/o any issues. @bvaradar @n3nash : Could it be some jar version mismatches? Any pointers would be appreciable. As I told earlier, I am running some issues locally and hence couldn't reproduce. Your help is much appreciated here.

rubenssoto commented 3 years ago

@nsivabalan I had a different error Hudi 0.7.0 Emr 6.2

Exception in thread "ForkJoinPool-1-worker-13" java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem at org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:318) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:484) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:401) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:384) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:374) at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:263) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:181) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:136) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:355) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4(HoodieSparkSqlWriter.scala:403) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4$adapted(HoodieSparkSqlWriter.scala:399) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288) at hudiwriter.HudiWriter.createHudiTable(HudiWriter.scala:48) at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:46) at jobs.TableProcessor.start(TableProcessor.scala:86) at TableProcessorWrapper$.$anonfun$main$2(TableProcessorWrapper.scala:23) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.ClassNotFoundException: org.apache.calcite.rel.type.RelDataTypeSystem at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 65 more

rubenssoto commented 3 years ago

I give another try on EMR 5.32 Hudi 0.6.0

First ERROR:

21/02/24 22:27:25 WARN FileUtils: Error setting permissions of hdfs://ip-10-0-28-211.us-west-2.compute.internal:8020/user/spark/warehouse/raw_courier_api_hudi.db java.io.IOException: Unable to set permissions of hdfs://ip-10-0-28-211.us-west-2.compute.internal:8020/user/spark/warehouse/raw_courier_api_hudi.db at org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:758) at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:527) at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:201) at com.amazonaws.glue.catalog.util.MetastoreClientUtils.makeDirs(MetastoreClientUtils.java:43) at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.createDatabase(GlueMetastoreClientDelegate.java:236) at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.createDatabase(AWSCatalogMetastoreClient.java:272) at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:316) at org.apache.hadoop.hive.ql.exec.DDLTask.createDatabase(DDLTask.java:3895) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:271) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:121) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94) at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at hudiwriter.HudiWriter.createHudiTable(HudiWriter.scala:48) at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:38) at jobs.TableProcessor.start(TableProcessor.scala:77) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply$mcV$sp(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.NumberFormatException: For input string: "testingforemptydefaultvalue" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.hadoop.conf.Configuration.getInts(Configuration.java:1402) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:332) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:355) at org.apache.hadoop.fs.FsShell.init(FsShell.java:96) at org.apache.hadoop.fs.FsShell.run(FsShell.java:296) at org.apache.hadoop.hive.shims.HadoopShimsSecure.run(HadoopShimsSecure.java:377) at org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:729) ... 66 more

Second ERROR:

Exception in thread "ForkJoinPool-1-worker-2" java.lang.NoClassDefFoundError: org/json/JSONException at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367) at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357) at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:262) at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:176) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:130) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94) at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:169) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:197) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:169) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:112) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$executeQuery$1(SQLExecution.scala:83) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:94) at org.apache.spark.sql.execution.QueryExecutionMetrics$.withMetrics(QueryExecutionMetrics.scala:141) at org.apache.spark.sql.execution.SQLExecution$.org$apache$spark$sql$execution$SQLExecution$$withMetrics(SQLExecution.scala:178) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:93) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:200) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at hudiwriter.HudiWriter.createHudiTable(HudiWriter.scala:48) at hudiwriter.HudiContext.writeToHudi(HudiContext.scala:38) at jobs.TableProcessor.start(TableProcessor.scala:77) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply$mcV$sp(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at TableProcessorWrapper$$anonfun$1$$anonfun$apply$1.apply(TableProcessorWrapper.scala:23) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) Caused by: java.lang.ClassNotFoundException: org.json.JSONException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 64 more

kimberlyamandalu commented 3 years ago

@kimberlyamandalu : Sorry for the delay. This is weird. Can you check if org/json/JSONException is present in /usr/lib/hive/lib/json-1.8.jar ?

@bvaradar Sorry for the delayed response. Yes, the JSONException object is present in this jar $ jar -xvf json-1.8.jar created: META-INF/ inflated: META-INF/MANIFEST.MF created: org/ created: org/json/ inflated: org/json/JSON.class inflated: org/json/JSONArray.class inflated: org/json/JSONException.class inflated: org/json/JSONObject$1.class inflated: org/json/JSONObject.class inflated: org/json/JSONString.class inflated: org/json/JSONStringer$Scope.class inflated: org/json/JSONStringer.class inflated: org/json/JSONTokener.class created: META-INF/maven/ created: META-INF/maven/com.tdunning/ created: META-INF/maven/com.tdunning/json/ inflated: META-INF/maven/com.tdunning/json/pom.xml inflated: META-INF/maven/com.tdunning/json/pom.properties

ismailsimsek commented 3 years ago

this comment might help for the second error #1751 (comment) java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem

org.apache.calcite
calcite-core
1.16.0

org.apache.thrift
libfb303
0.9.3
ngonik commented 3 years ago

Hey, I'm having the same issues with JSONEXception on EMR as mentioned above. Is there any update around that? Anything I can help with to make it work? Thanks!

ngonik commented 3 years ago

I was able to fix the JSONException error on EMR. Just needed to manually add the org.json (https://mvnrepository.com/artifact/org.json/json) package to both executor and driver extraClassPath config when deploying the cluster.

diogodilcl commented 3 years ago

Hudi version: 0.7.0 Emr : 6.2

Hi,

when I use:

"hoodie.datasource.hive_sync.use_jdbc":"false"

I have the following exception:

21/04/28 22:19:49 ERROR HiveSyncTool: Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:406)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:384)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:374)
    at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:263)
    at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:181)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:136)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
    at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:355)
    at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4(HoodieSparkSqlWriter.scala:403)
    at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$4$adapted(HoodieSparkSqlWriter.scala:399)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
    at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:399)
    at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:460)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:218)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:124)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:123)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
    at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
    at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
    at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
    at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Socket Factory class not found: java.lang.ClassNotFoundException: Class testingforemptydefaultvalue not found
    at org.apache.hadoop.net.NetUtils.getSocketFactoryFromProperty(NetUtils.java:143)
    at org.apache.hadoop.net.NetUtils.getSocketFactory(NetUtils.java:98)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:309)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:290)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:171)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3358)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:483)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:234)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:583)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:548)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:528)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:394)
    ... 51 more

Existing tables are updated, but for tables that need to be created I get the exception above.

n3nash commented 3 years ago

@diogodilcl Are you able to reproduce this issue consistently ? Could you provide some ways to reproduce it so we can find a resolution.

n3nash commented 3 years ago

Closing this ticket due to inactivity. There is a PR open that will provide ways to disable JDBC.