[SUPPORT] Unable to sync with external hive metastore via metastore uris in the thrift protocol

rakeshramakrishnan commented 3 years ago

Have you gone through our FAQs? Yes

Describe the problem you faced Unable to sync to external hive metastore via thrift protocol. Instead the sync seems to happen with the local hive store.

To Reproduce Run pyspark file as below which does the following

connects to hive metastore using hive.metastore.uris using the thrift protocol and prints the existing tables: to show that the existing setup is able to connect to the metastore without any issues
generates a sample df using the generator from hudi, writes the df to a hudi table with hive sync enabled
reconnects to the hive metastore and prints the tables. Can observe that the newly synced table does not show up
On opening a new pyspark shell, can see that the required table shows up in the local spark warehouse dir: spark.catalog.listTables()
The below log shows HiveMetastoreConnection version 1.2.1 using Spark classes. Have tried connecting to the hive metastore using spark 3.0.1 and hive 2.3.7 jars and able to list the tables in the external metastore. However, unable to use it with hudi 0.6.0, and hence used spark 2.4.7 for the below example.

from pyspark.sql import SparkSession
from pyspark.sql.functions import lit

metastore_uri = "thrift://localhost:9083"
spark = SparkSession.builder \
    .appName("test-hudi-hive-sync") \
    .enableHiveSupport() \
    .config("hive.metastore.uris", metastore_uri) \
    .getOrCreate()

print("Before {}".format(spark.catalog.listTables()))

tableName = "hive_hudi_sync"
basePath = "file:///tmp/hive_hudi_sync"
sc = spark.sparkContext

dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(dataGen.generateInserts(10))
df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))\
    .withColumn("partitionpath", lit("partitionval"))

df.show()

hudi_options = {
    'hoodie.table.name': tableName,
    'hoodie.datasource.write.recordkey.field': 'uuid',
    'hoodie.datasource.write.partitionpath.field': 'partitionpath',
    'hoodie.datasource.write.table.name': tableName,
    'hoodie.datasource.write.operation': 'insert',
    'hoodie.datasource.write.precombine.field': 'ts',
    'hoodie.upsert.shuffle.parallelism': 2,
    'hoodie.insert.shuffle.parallelism': 2,
    'hoodie.datasource.hive_sync.enable': True,
    'hoodie.datasource.hive_sync.use_jdbc': False,
    'hoodie.datasource.hive_sync.jdbcurl': metastore_uri,
    'hoodie.datasource.hive_sync.partition_fields': 'partitionpath',
    'hoodie.datasource.hive_sync.table': tableName,
    'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor'
}

df\
    .write.format("hudi"). \
    options(**hudi_options). \
    mode("overwrite"). \
    save(basePath)

print("After {}".format(spark.catalog.listTables()))

Expected behavior

Expecting the table hive_hudi_sync to show up in the external hive metastore after hive sync
The hive sync succeeds according to logs, but not able to see the new table in the metastore.
Instead only seeing the existing tables in the hive metastore.

Environment Description

Hudi version : 0.6
Spark version : 2.4.7
Hive version : metastore uses Hive 3.1.0.3.1.0.0-78
Storage (HDFS/S3/GCS..) : S3, but same for local too
Running on Docker? (yes/no) : No

Additional context Have attached the run logs:

Can see that the native spark connect to hive metastore works. Am able to see the tables from the external hive metastore.
However, in the Hudi-hive sync run, can observe that it is not making a connection to the external hive metastore, but is using the local spark warehouse dir
Have removed the logs from org.apache.spark because they were adding to noise. If I need to attach it, do let me know.

.venv ❯ bin/spark-submit --master local[2] --deploy-mode client --packages org.apache.hudi:hudi-spark-bundle_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.4 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' hive-metastore-pyspark.py
Ivy Default Cache set to: /Users/rakeshramakrishnan/.ivy2/cache
The jars for the packages stored in: /Users/rakeshramakrishnan/.ivy2/jars
:: loading settings :: url = jar:file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hudi#hudi-spark-bundle_2.11 added as a dependency
org.apache.spark#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-1ea4440b-ae8a-49c2-b638-7765bc189b84;1.0
    confs: [default]
    found org.apache.hudi#hudi-spark-bundle_2.11;0.6.0 in central
    found org.apache.spark#spark-avro_2.11;2.4.4 in central
    found org.spark-project.spark#unused;1.0.0 in central
:: resolution report :: resolve 300ms :: artifacts dl 6ms
    :: modules in use:
    org.apache.hudi#hudi-spark-bundle_2.11;0.6.0 from central in [default]
    org.apache.spark#spark-avro_2.11;2.4.4 from central in [default]
    org.spark-project.spark#unused;1.0.0 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-1ea4440b-ae8a-49c2-b638-7765bc189b84
    confs: [default]
    0 artifacts copied, 3 already retrieved (0kB/6ms)
294  [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1245 [Thread-5] INFO  org.apache.spark.SparkContext  - Running Spark version 2.4.7
1268 [Thread-5] INFO  org.apache.spark.SparkContext  - Submitted application: test-hudi-hive-sync
1949 [Thread-5] INFO  org.apache.spark.ui.SparkUI  - Bound SparkUI to 0.0.0.0, and started at http://192.168.0.104:4040
1966 [Thread-5] INFO  org.apache.spark.SparkContext  - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar at spark://192.168.0.104:62151/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar with timestamp 1610556549207
1967 [Thread-5] INFO  org.apache.spark.SparkContext  - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar at spark://192.168.0.104:62151/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar with timestamp 1610556549208
1967 [Thread-5] INFO  org.apache.spark.SparkContext  - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://192.168.0.104:62151/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1610556549208
1992 [Thread-5] INFO  org.apache.spark.SparkContext  - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar with timestamp 1610556549232
1994 [Thread-5] INFO  org.apache.spark.util.Utils  - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar
2118 [Thread-5] INFO  org.apache.spark.SparkContext  - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar with timestamp 1610556549359
2118 [Thread-5] INFO  org.apache.spark.util.Utils  - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.apache.spark_spark-avro_2.11-2.4.4.jar
2126 [Thread-5] INFO  org.apache.spark.SparkContext  - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1610556549367
2126 [Thread-5] INFO  org.apache.spark.util.Utils  - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.spark-project.spark_unused-1.0.0.jar
2180 [Thread-5] INFO  org.apache.spark.executor.Executor  - Starting executor ID driver on host localhost
2237 [Thread-5] INFO  org.apache.spark.util.Utils  - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62152.
2238 [Thread-5] INFO  org.apache.spark.network.netty.NettyBlockTransferService  - Server created on 192.168.0.104:62152
2584 [Thread-5] INFO  org.apache.spark.sql.internal.SharedState  - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse').
2585 [Thread-5] INFO  org.apache.spark.sql.internal.SharedState  - Warehouse path is 'file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse'.
3140 [Thread-5] INFO  org.apache.spark.sql.hive.HiveUtils  - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
3642 [Thread-5] INFO  hive.metastore  - Trying to connect to metastore with URI thrift://localhost:9083
4753 [Thread-5] INFO  hive.metastore  - Connected to metastore.
5590 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/46498653-2d37-4043-aa85-93083a524fc0_resources
5597 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0
5605 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0
5615 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0/_tmp_space.db
5618 [Thread-5] INFO  org.apache.spark.sql.hive.client.HiveClientImpl  - Warehouse location for Hive client (version 1.2.2) is file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse
15852 [Thread-5] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 1 finished: hasNext at NativeMethodAccessorImpl.java:0, took 0.041704 s

Before [Table(name='****', database='default', description=None, tableType='MANAGED', isTemporary=False), .... tables in hive metastore]

17183 [Thread-5] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 4 finished: showString at NativeMethodAccessorImpl.java:0, took 0.035620 s
+-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+
|          begin_lat|          begin_lon|    driver|            end_lat|            end_lon|              fare|partitionpath|    rider| ts|                uuid|
+-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+
| 0.4726905879569653|0.46157858450465483|driver-213|  0.754803407008858| 0.9671159942018241|34.158284716382845| partitionval|rider-213|0.0|f0476ada-9d26-4a6...|
| 0.6100070562136587| 0.8779402295427752|driver-213| 0.3407870505929602| 0.5030798142293655|  43.4923811219014| partitionval|rider-213|0.0|2507bfa1-01ec-471...|
| 0.5731835407930634| 0.4923479652912024|driver-213|0.08988581780930216|0.42520899698713666| 64.27696295884016| partitionval|rider-213|0.0|f3951634-256a-46f...|
|0.21624150367601136|0.14285051259466197|driver-213| 0.5890949624813784| 0.0966823831927115| 93.56018115236618| partitionval|rider-213|0.0|f0e3fdc7-685d-45d...|
|   0.40613510977307| 0.5644092139040959|driver-213|  0.798706304941517|0.02698359227182834|17.851135255091155| partitionval|rider-213|0.0|92233d5f-f684-43e...|
| 0.8742041526408587| 0.7528268153249502|driver-213| 0.9197827128888302|  0.362464770874404|19.179139106643607| partitionval|rider-213|0.0|f683850c-2940-4e0...|
| 0.1856488085068272| 0.9694586417848392|driver-213|0.38186367037201974|0.25252652214479043| 33.92216483948643| partitionval|rider-213|0.0|47af2a09-264b-4bd...|
| 0.0750588760043035|0.03844104444445928|driver-213|0.04376353354538354| 0.6346040067610669| 66.62084366450246| partitionval|rider-213|0.0|43223a73-70e6-4ec...|
|  0.651058505660742| 0.8192868687714224|driver-213|0.20714896002914462|0.06224031095826987| 41.06290929046368| partitionval|rider-213|0.0|851b928f-f368-49c...|
|0.11488393157088261| 0.6273212202489661|driver-213| 0.7454678537511295| 0.3954939864908973| 27.79478688582596| partitionval|rider-213|0.0|2683968f-4b48-477...|
+-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+

17293 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Initializing file:///tmp/hive_hudi_sync as hoodie table file:///tmp/hive_hudi_sync
17297 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17323 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17325 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17330 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17336 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17336 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished initializing Table of type COPY_ON_WRITE from file:///tmp/hive_hudi_sync
17365 [Thread-5] INFO  org.apache.hudi.HoodieSparkSqlWriter$  - Registered avro schema : {
  "type" : "record",
  "name" : "hive_hudi_sync_record",
  "namespace" : "hoodie.hive_hudi_sync",
  "fields" : [ {
    "name" : "begin_lat",
    "type" : [ "double", "null" ]
  }, {
    "name" : "begin_lon",
    "type" : [ "double", "null" ]
  }, {
    "name" : "driver",
    "type" : [ "string", "null" ]
  }, {
    "name" : "end_lat",
    "type" : [ "double", "null" ]
  }, {
    "name" : "end_lon",
    "type" : [ "double", "null" ]
  }, {
    "name" : "fare",
    "type" : [ "double", "null" ]
  }, {
    "name" : "partitionpath",
    "type" : "string"
  }, {
    "name" : "rider",
    "type" : [ "string", "null" ]
  }, {
    "name" : "ts",
    "type" : [ "double", "null" ]
  }, {
    "name" : "uuid",
    "type" : [ "string", "null" ]
  } ]
}
17458 [Thread-5] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 14.138735 ms
17521 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17521 [Thread-5] INFO  org.apache.hudi.client.AbstractHoodieClient  - Starting Timeline service !!
17522 [Thread-5] INFO  org.apache.hudi.client.embedded.EmbeddedTimelineService  - Overriding hostIp to (192.168.0.104) found in spark-conf. It was null
17524 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating View Manager with storage type :MEMORY
17525 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating in-memory based Table View
17537 [Thread-5] INFO  org.eclipse.jetty.util.log  - Logging initialized @18965ms to org.eclipse.jetty.util.log.Slf4jLog
17646 [Thread-5] INFO  io.javalin.Javalin  -
           __                      __ _
          / /____ _ _   __ ____ _ / /(_)____
     __  / // __ `/| | / // __ `// // // __ \
    / /_/ // /_/ / | |/ // /_/ // // // / / /
    \____/ \__,_/  |___/ \__,_//_//_//_/ /_/

        https://javalin.io/documentation

17647 [Thread-5] INFO  io.javalin.Javalin  - Starting Javalin ...
17768 [Thread-5] INFO  io.javalin.Javalin  - Listening on http://localhost:62161/
17768 [Thread-5] INFO  io.javalin.Javalin  - Javalin started in 125ms \o/
17768 [Thread-5] INFO  org.apache.hudi.timeline.service.TimelineService  - Starting Timeline server on port :62161
17768 [Thread-5] INFO  org.apache.hudi.client.embedded.EmbeddedTimelineService  - Started embedded timeline server at 192.168.0.104:62161
17782 [Thread-5] INFO  org.apache.spark.SparkContext  - Starting job: isEmpty at HoodieSparkSqlWriter.scala:164
17783 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Got job 5 (isEmpty at HoodieSparkSqlWriter.scala:164) with 1 output partitions
17783 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Final stage: ResultStage 5 (isEmpty at HoodieSparkSqlWriter.scala:164)
17783 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Parents of final stage: List()
17784 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Missing parents: List()
17784 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Submitting ResultStage 5 (MapPartitionsRDD[24] at map at HoodieSparkSqlWriter.scala:139), which has no missing parents
17789 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_5 stored as values in memory (estimated size 28.8 KB, free 366.2 MB)
17795 [dag-scheduler-event-loop] INFO  org.apache.spark.storage.memory.MemoryStore  - Block broadcast_5_piece0 stored as bytes in memory (estimated size 13.3 KB, free 366.2 MB)
17796 [dispatcher-event-loop-0] INFO  org.apache.spark.storage.BlockManagerInfo  - Added broadcast_5_piece0 in memory on 192.168.0.104:62152 (size: 13.3 KB, free: 366.3 MB)
17796 [dag-scheduler-event-loop] INFO  org.apache.spark.SparkContext  - Created broadcast 5 from broadcast at DAGScheduler.scala:1184
17797 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - Submitting 1 missing tasks from ResultStage 5 (MapPartitionsRDD[24] at map at HoodieSparkSqlWriter.scala:139) (first 15 tasks are for partitions Vector(0))
17797 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.TaskSchedulerImpl  - Adding task set 5.0 with 1 tasks
17802 [dispatcher-event-loop-1] INFO  org.apache.spark.scheduler.TaskSetManager  - Starting task 0.0 in stage 5.0 (TID 6, localhost, executor driver, partition 0, PROCESS_LOCAL, 9327 bytes)
17803 [Executor task launch worker for task 6] INFO  org.apache.spark.executor.Executor  - Running task 0.0 in stage 5.0 (TID 6)
17847 [Executor task launch worker for task 6] INFO  org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator  - Code generated in 14.283395 ms
17856 [Executor task launch worker for task 6] INFO  org.apache.spark.executor.Executor  - Finished task 0.0 in stage 5.0 (TID 6). 2049 bytes result sent to driver
17863 [task-result-getter-2] INFO  org.apache.spark.scheduler.TaskSetManager  - Finished task 0.0 in stage 5.0 (TID 6) in 65 ms on localhost (executor driver) (1/1)
17863 [task-result-getter-2] INFO  org.apache.spark.scheduler.TaskSchedulerImpl  - Removed TaskSet 5.0, whose tasks have all completed, from pool
17865 [dag-scheduler-event-loop] INFO  org.apache.spark.scheduler.DAGScheduler  - ResultStage 5 (isEmpty at HoodieSparkSqlWriter.scala:164) finished in 0.079 s
17865 [Thread-5] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 5 finished: isEmpty at HoodieSparkSqlWriter.scala:164, took 0.083123 s
17871 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17872 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17873 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17874 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17874 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit timeline for file:///tmp/hive_hudi_sync
17885 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants []
17886 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating View Manager with storage type :REMOTE_FIRST
17886 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote first table view
17890 [Thread-5] INFO  org.apache.hudi.client.HoodieWriteClient  - Generate a new instant time 20210113221924
17890 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17891 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17892 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17892 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17892 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit timeline for file:///tmp/hive_hudi_sync
17894 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants []
17897 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Creating a new instant [==>20210113221924__commit__REQUESTED]
17919 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17921 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17922 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17923 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17923 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit timeline for file:///tmp/hive_hudi_sync
17926 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants [[==>20210113221924__commit__REQUESTED]]
17930 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating View Manager with storage type :REMOTE_FIRST
17930 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote first table view
17933 [Thread-5] INFO  org.apache.hudi.client.AsyncCleanerService  - Auto cleaning is not enabled. Not running cleaner now
17984 [Thread-5] INFO  org.apache.spark.SparkContext  - Starting job: countByKey at WorkloadProfile.java:73
18237 [Thread-5] INFO  org.apache.hudi.table.action.commit.BaseCommitActionExecutor  - Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=10, numUpdates=0}, partitionStat={partitionval=WorkloadStat {numInserts=10, numUpdates=0}}}
18278 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Checking for file exists ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.commit.requested
18291 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Create new file for toInstant ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.inflight
18293 [Thread-5] INFO  org.apache.hudi.table.action.commit.UpsertPartitioner  - AvgRecordSize => 1024
18432 [Thread-5] INFO  org.apache.spark.SparkContext  - Starting job: collectAsMap at UpsertPartitioner.java:216
18523 [Thread-5] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 7 finished: collectAsMap at UpsertPartitioner.java:216, took 0.089976 s
18525 [Thread-5] INFO  org.apache.hudi.table.action.commit.UpsertPartitioner  - For partitionPath : partitionval Small Files => []
18525 [Thread-5] INFO  org.apache.hudi.table.action.commit.UpsertPartitioner  - After small file assignment: unassignedInserts => 10, totalInsertBuckets => 1, recordsPerBucket => 122880
18526 [Thread-5] INFO  org.apache.hudi.table.action.commit.UpsertPartitioner  - Total insert buckets for partition path partitionval => [InsertBucket {bucketNumber=0, weight=1.0}]
18526 [Thread-5] INFO  org.apache.hudi.table.action.commit.UpsertPartitioner  - Total Buckets :1, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=114dfaba-3a25-4278-9e7b-f2784642f76c, partitionPath=partitionval}},
Partition to insert buckets => {partitionval=[InsertBucket {bucketNumber=0, weight=1.0}]},
UpdateLocations mapped to buckets =>{}
18585 [Thread-5] INFO  org.apache.hudi.table.action.commit.BaseCommitActionExecutor  - Auto commit disabled for 20210113221924
18796 [pool-18-thread-1] INFO  org.apache.hudi.common.util.queue.IteratorBasedQueueProducer  - starting to buffer records
18797 [pool-18-thread-2] INFO  org.apache.hudi.common.util.queue.BoundedInMemoryExecutor  - starting consumer thread
18806 [pool-18-thread-1] INFO  org.apache.hudi.common.util.queue.IteratorBasedQueueProducer  - finished buffering records
18811 [pool-18-thread-2] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
18838 [pool-18-thread-2] INFO  org.apache.hudi.table.MarkerFiles  - Creating Marker Path=file:/tmp/hive_hudi_sync/.hoodie/.temp/20210113221924/partitionval/114dfaba-3a25-4278-9e7b-f2784642f76c-0_0-10-14_20210113221924.parquet.marker.CREATE
18897 [pool-18-thread-2] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
18900 [pool-18-thread-2] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
18901 [pool-18-thread-2] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19008 [pool-18-thread-2] INFO  org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.gz]
19434 [pool-18-thread-2] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19434 [pool-18-thread-2] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19434 [pool-18-thread-2] INFO  org.apache.hudi.io.HoodieCreateHandle  - New CreateHandle for partition :partitionval with fileId 114dfaba-3a25-4278-9e7b-f2784642f76c-0
19445 [pool-18-thread-2] INFO  org.apache.hudi.io.HoodieCreateHandle  - Closing the file 114dfaba-3a25-4278-9e7b-f2784642f76c-0 as we are done with all the records 10
19445 [pool-18-thread-2] INFO  org.apache.parquet.hadoop.InternalParquetRecordWriter  - Flushing mem columnStore to file. allocated memory: 2179
19559 [pool-18-thread-2] INFO  org.apache.hudi.io.HoodieCreateHandle  - CreateHandle for partitionPath partitionval fileID 114dfaba-3a25-4278-9e7b-f2784642f76c-0, took 747 ms.
19559 [pool-18-thread-2] INFO  org.apache.hudi.common.util.queue.BoundedInMemoryExecutor  - Queue Consumption is done; notifying producer threads
19573 [Thread-5] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 8 finished: count at HoodieSparkSqlWriter.scala:389, took 0.980784 s
19574 [Thread-5] INFO  org.apache.hudi.HoodieSparkSqlWriter$  - No errors. Proceeding to commit the write.
19654 [Thread-5] INFO  org.apache.spark.SparkContext  - Starting job: collect at AbstractHoodieWriteClient.java:98
19729 [Thread-5] INFO  org.apache.hudi.client.AbstractHoodieWriteClient  - Committing 20210113221924
19729 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
19731 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19731 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
19732 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
19732 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
19733 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19741 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
19742 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
19742 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit timeline for file:///tmp/hive_hudi_sync
19743 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants [[==>20210113221924__commit__INFLIGHT]]
19744 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating View Manager with storage type :REMOTE_FIRST
19744 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote first table view
19864 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Marking instant complete [==>20210113221924__commit__INFLIGHT]
19864 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Checking for file exists ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.inflight
19887 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Create new file for toInstant ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.commit
19887 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Completed [==>20210113221924__commit__INFLIGHT]
19945 [Thread-5] INFO  org.apache.spark.SparkContext  - Starting job: foreach at MarkerFiles.java:97
20000 [Thread-5] INFO  org.apache.hudi.table.MarkerFiles  - Removing marker directory at file:/tmp/hive_hudi_sync/.hoodie/.temp/20210113221924
20005 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
20006 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20007 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20008 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
20008 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit timeline for file:///tmp/hive_hudi_sync
20010 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants [[20210113221924__commit__COMPLETED]]
20011 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating View Manager with storage type :REMOTE_FIRST
20011 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote first table view
20019 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants [[==>20210113221924__commit__REQUESTED], [==>20210113221924__commit__INFLIGHT], [20210113221924__commit__COMPLETED]]
20020 [Thread-5] INFO  org.apache.hudi.table.HoodieTimelineArchiveLog  - No Instants to archive
20021 [Thread-5] INFO  org.apache.hudi.client.HoodieWriteClient  - Auto cleaning is enabled. Running cleaner now
20021 [Thread-5] INFO  org.apache.hudi.client.HoodieWriteClient  - Cleaner started
20021 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
20022 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20022 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20023 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
20023 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit timeline for file:///tmp/hive_hudi_sync
20025 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants [[20210113221924__commit__COMPLETED]]
20025 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating View Manager with storage type :REMOTE_FIRST
20025 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote first table view
20032 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating remote view for basePath file:///tmp/hive_hudi_sync. Server=192.168.0.104:62161
20033 [Thread-5] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating InMemory based view for basePath file:///tmp/hive_hudi_sync
20066 [Thread-5] INFO  org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView  - Sending request : (http://192.168.0.104:62161/v1/hoodie/view/compactions/pending/?basepath=file%3A%2F%2F%2Ftmp%2Fhive_hudi_sync&lastinstantts=20210113221924&timelinehash=40aa81825cab43b9fe13e7d01121c08f8868e61fb6d6794c1fe9d0d7f43e449e)
20362 [qtp1464312295-98] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
20363 [qtp1464312295-98] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20364 [qtp1464312295-98] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20364 [qtp1464312295-98] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
20364 [qtp1464312295-98] INFO  org.apache.hudi.common.table.view.FileSystemViewManager  - Creating InMemory based view for basePath file:///tmp/hive_hudi_sync
20366 [qtp1464312295-98] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants [[20210113221924__commit__COMPLETED]]
20374 [qtp1464312295-98] INFO  org.apache.hudi.timeline.service.FileSystemViewHandler  - TimeTakenMillis[Total=13, Refresh=11, handle=2, Check=0], Success=true, Query=basepath=file%3A%2F%2F%2Ftmp%2Fhive_hudi_sync&lastinstantts=20210113221924&timelinehash=40aa81825cab43b9fe13e7d01121c08f8868e61fb6d6794c1fe9d0d7f43e449e, Host=192.168.0.104:62161, synced=false
20404 [Thread-5] INFO  org.apache.hudi.table.action.clean.CleanPlanner  - No earliest commit to retain. No need to scan partitions !!
20404 [Thread-5] INFO  org.apache.hudi.table.action.clean.CleanActionExecutor  - Nothing to clean here. It is already clean
20418 [Thread-5] INFO  org.apache.hudi.client.AbstractHoodieWriteClient  - Committed 20210113221924
20418 [Thread-5] INFO  org.apache.hudi.HoodieSparkSqlWriter$  - Commit 20210113221924 successful!
20418 [Thread-5] INFO  org.apache.hudi.HoodieSparkSqlWriter$  - Config.isInlineCompaction ? false
20419 [Thread-5] INFO  org.apache.hudi.HoodieSparkSqlWriter$  - Compaction Scheduled is Option{val=null}
20420 [Thread-5] INFO  org.apache.hudi.HoodieSparkSqlWriter$  - Syncing to Hive Metastore (URL: thrift://localhost:9083)
20547 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading HoodieTableMetaClient from file:/tmp/hive_hudi_sync
20547 [Thread-5] INFO  org.apache.hudi.common.fs.FSUtils  - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20548 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableConfig  - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20548 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:/tmp/hive_hudi_sync
20548 [Thread-5] INFO  org.apache.hudi.common.table.HoodieTableMetaClient  - Loading Active commit timeline for file:/tmp/hive_hudi_sync
20550 [Thread-5] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline  - Loaded instants [[20210113221924__commit__COMPLETED]]
20681 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
20712 [Thread-5] INFO  org.apache.hadoop.hive.metastore.ObjectStore  - ObjectStore, initialize called
20850 [Thread-5] INFO  DataNucleus.Persistence  - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20850 [Thread-5] INFO  DataNucleus.Persistence  - Property datanucleus.cache.level2 unknown - will be ignored
21860 [Thread-5] INFO  org.apache.hadoop.hive.metastore.ObjectStore  - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
22725 [Thread-5] INFO  DataNucleus.Datastore  - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22726 [Thread-5] INFO  DataNucleus.Datastore  - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22901 [Thread-5] INFO  DataNucleus.Datastore  - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22901 [Thread-5] INFO  DataNucleus.Datastore  - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22975 [Thread-5] INFO  DataNucleus.Query  - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
22978 [Thread-5] INFO  org.apache.hadoop.hive.metastore.MetaStoreDirectSql  - Using direct SQL, underlying DB is DERBY
22981 [Thread-5] INFO  org.apache.hadoop.hive.metastore.ObjectStore  - Initialized ObjectStore
23194 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - Added admin role in metastore
23196 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - Added public role in metastore
23238 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - No user is added in admin role, since config is empty
23333 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_all_databases
23334 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_all_databases
23355 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_functions: db=default pat=*
23355 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_functions: db=default pat=*
23357 [Thread-5] INFO  DataNucleus.Datastore  - The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
23408 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - Trying to sync hoodie table hive_hudi_sync with base path file:/tmp/hive_hudi_sync of type COPY_ON_WRITE
23408 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_table : db=default tbl=hive_hudi_sync
23408 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_table : db=default tbl=hive_hudi_sync
23478 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources
23485 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
23492 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
23500 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db
23513 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Time taken to start SessionState and create Driver: 85 ms
23517 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
23517 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
23517 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
23556 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
23559 [Thread-5] INFO  hive.ql.parse.ParseDriver  - Parsing command: create database if not exists default
24543 [Thread-5] INFO  hive.ql.parse.ParseDriver  - Parse Completed
24545 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=parse start=1610556570797 end=1610556571786 duration=989 from=org.apache.hadoop.hive.ql.Driver>
24548 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
24609 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Semantic Analysis Completed
24609 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=semanticAnalyze start=1610556571789 end=1610556571850 duration=61 from=org.apache.hadoop.hive.ql.Driver>
24619 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
24619 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=compile start=1610556570758 end=1610556571860 duration=1102 from=org.apache.hadoop.hive.ql.Driver>
24619 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Concurrency mode is disabled, not creating a lock manager
24619 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
24619 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Starting command(queryId=rakeshramakrishnan_20210113221930_0ed9aaa9-b2ee-4824-a8f4-178fda3cdd72): create database if not exists default
24653 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=TimeToSubmit start=1610556570758 end=1610556571894 duration=1136 from=org.apache.hadoop.hive.ql.Driver>
24653 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
24653 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
24658 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Starting task [Stage-0:DDL] in serial mode
24665 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: create_database: Database(name:default, description:null, locationUri:null, parameters:null, ownerName:rakeshramakrishnan, ownerType:USER)
24665 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=create_database: Database(name:default, description:null, locationUri:null, parameters:null, ownerName:rakeshramakrishnan, ownerType:USER)
24671 [Thread-5] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler  - AlreadyExistsException(message:Database default already exists)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
    at com.sun.proxy.$Proxy35.create_database(Unknown Source)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
    at com.sun.proxy.$Proxy36.createDatabase(Unknown Source)
    at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createDatabase(DDLTask.java:3895)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:271)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
    at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:121)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
    at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
    at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
    at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
    at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359)
    at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417)
    at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205)
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
    at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
    at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:745)

24671 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=runTasks start=1610556571894 end=1610556571912 duration=18 from=org.apache.hadoop.hive.ql.Driver>
24671 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=Driver.execute start=1610556571860 end=1610556571912 duration=52 from=org.apache.hadoop.hive.ql.Driver>
OK
24672 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - OK
24672 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
24672 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=releaseLocks start=1610556571913 end=1610556571913 duration=0 from=org.apache.hadoop.hive.ql.Driver>
24672 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=Driver.run start=1610556570758 end=1610556571913 duration=1155 from=org.apache.hadoop.hive.ql.Driver>
24673 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Time taken to execute [create database if not exists default]: 1159 ms
24691 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - Hive table hive_hudi_sync is not found. Creating it
24712 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync'
24728 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources
24734 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
24741 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
24747 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db
24747 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Time taken to start SessionState and create Driver: 35 ms
24747 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
24747 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
24747 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
24748 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
24748 [Thread-5] INFO  hive.ql.parse.ParseDriver  - Parsing command: CREATE EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync'
24756 [Thread-5] INFO  hive.ql.parse.ParseDriver  - Parse Completed
24756 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=parse start=1610556571989 end=1610556571997 duration=8 from=org.apache.hadoop.hive.ql.Driver>
24756 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
24793 [Thread-5] INFO  org.apache.hadoop.hive.ql.parse.CalcitePlanner  - Starting Semantic Analysis
24802 [Thread-5] INFO  org.apache.hadoop.hive.ql.parse.CalcitePlanner  - Creating table default.hive_hudi_sync position=37
24812 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_table : db=default tbl=hive_hudi_sync
24812 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_table : db=default tbl=hive_hudi_sync
24813 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_database: default
24814 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_database: default
24832 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Semantic Analysis Completed
24833 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=semanticAnalyze start=1610556571997 end=1610556572074 duration=77 from=org.apache.hadoop.hive.ql.Driver>
24833 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
24833 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=compile start=1610556571988 end=1610556572074 duration=86 from=org.apache.hadoop.hive.ql.Driver>
24833 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Concurrency mode is disabled, not creating a lock manager
24833 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
24833 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Starting command(queryId=rakeshramakrishnan_20210113221931_93adf670-a860-4ed6-b873-35027fee5f4e): CREATE EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync'
24834 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=TimeToSubmit start=1610556571988 end=1610556572075 duration=87 from=org.apache.hadoop.hive.ql.Driver>
24834 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
24834 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
24835 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Starting task [Stage-0:DDL] in serial mode
24892 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: create_table: Table(tableName:hive_hudi_sync, dbName:default, owner:rakeshramakrishnan, createTime:1610556572, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:_hoodie_commit_time, type:string, comment:null), FieldSchema(name:_hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:_hoodie_record_key, type:string, comment:null), FieldSchema(name:_hoodie_partition_path, type:string, comment:null), FieldSchema(name:_hoodie_file_name, type:string, comment:null), FieldSchema(name:begin_lat, type:double, comment:null), FieldSchema(name:begin_lon, type:double, comment:null), FieldSchema(name:driver, type:string, comment:null), FieldSchema(name:end_lat, type:double, comment:null), FieldSchema(name:end_lon, type:double, comment:null), FieldSchema(name:fare, type:double, comment:null), FieldSchema(name:rider, type:string, comment:null), FieldSchema(name:ts, type:double, comment:null), FieldSchema(name:uuid, type:string, comment:null)], location:file:/tmp/hive_hudi_sync, inputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:partitionpath, type:string, comment:null)], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:false)
24893 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=create_table: Table(tableName:hive_hudi_sync, dbName:default, owner:rakeshramakrishnan, createTime:1610556572, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:_hoodie_commit_time, type:string, comment:null), FieldSchema(name:_hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:_hoodie_record_key, type:string, comment:null), FieldSchema(name:_hoodie_partition_path, type:string, comment:null), FieldSchema(name:_hoodie_file_name, type:string, comment:null), FieldSchema(name:begin_lat, type:double, comment:null), FieldSchema(name:begin_lon, type:double, comment:null), FieldSchema(name:driver, type:string, comment:null), FieldSchema(name:end_lat, type:double, comment:null), FieldSchema(name:end_lon, type:double, comment:null), FieldSchema(name:fare, type:double, comment:null), FieldSchema(name:rider, type:string, comment:null), FieldSchema(name:ts, type:double, comment:null), FieldSchema(name:uuid, type:string, comment:null)], location:file:/tmp/hive_hudi_sync, inputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:partitionpath, type:string, comment:null)], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:false)
25064 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=runTasks start=1610556572075 end=1610556572305 duration=230 from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=Driver.execute start=1610556572074 end=1610556572305 duration=231 from=org.apache.hadoop.hive.ql.Driver>
OK
25064 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - OK
25064 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=releaseLocks start=1610556572305 end=1610556572305 duration=0 from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=Driver.run start=1610556571988 end=1610556572305 duration=317 from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Time taken to execute [CREATE EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync']: 317 ms
25065 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - Schema sync complete. Syncing partitions for hive_hudi_sync
25065 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - Last commit time synced was found to be null
25066 [Thread-5] INFO  org.apache.hudi.sync.common.AbstractSyncHoodieClient  - Last commit time synced is not known, listing all partitions in file:/tmp/hive_hudi_sync,FS :org.apache.hadoop.fs.LocalFileSystem@19ee471f
25089 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - Storage partitions scan complete. Found 1
25089 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_partitions : db=default tbl=hive_hudi_sync
25090 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_partitions : db=default tbl=hive_hudi_sync
25122 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - New Partitions [partitionval]
25122 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Adding partitions 1 to table hive_hudi_sync
25138 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources
25144 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
25150 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
25156 [Thread-5] INFO  org.apache.hadoop.hive.ql.session.SessionState  - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db
25157 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Time taken to start SessionState and create Driver: 35 ms
25157 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO  hive.ql.parse.ParseDriver  - Parsing command: ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS   PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval'
25161 [Thread-5] INFO  hive.ql.parse.ParseDriver  - Parse Completed
25161 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=parse start=1610556572398 end=1610556572402 duration=4 from=org.apache.hadoop.hive.ql.Driver>
25161 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
25162 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_table : db=default tbl=hive_hudi_sync
25162 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_table : db=default tbl=hive_hudi_sync
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Semantic Analysis Completed
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=semanticAnalyze start=1610556572402 end=1610556572588 duration=186 from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=compile start=1610556572398 end=1610556572588 duration=190 from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Concurrency mode is disabled, not creating a lock manager
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Starting command(queryId=rakeshramakrishnan_20210113221932_5b183376-504e-4518-9c26-46d6773384df): ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS   PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval'
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=TimeToSubmit start=1610556572398 end=1610556572588 duration=190 from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
25348 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - Starting task [Stage-0:DDL] in serial mode
25348 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_table : db=default tbl=hive_hudi_sync
25348 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_table : db=default tbl=hive_hudi_sync
25375 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: add_partitions
25375 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=add_partitions
25427 [Thread-5] WARN  hive.log  - Updating partition stats fast for: hive_hudi_sync
25428 [Thread-5] WARN  hive.log  - Updated size to 437811
25479 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=runTasks start=1610556572588 end=1610556572720 duration=132 from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=Driver.execute start=1610556572588 end=1610556572721 duration=133 from=org.apache.hadoop.hive.ql.Driver>
OK
25480 [Thread-5] INFO  org.apache.hadoop.hive.ql.Driver  - OK
25480 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=releaseLocks start=1610556572721 end=1610556572721 duration=0 from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO  org.apache.hadoop.hive.ql.log.PerfLogger  - </PERFLOG method=Driver.run start=1610556572398 end=1610556572721 duration=323 from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - Time taken to execute [ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS   PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval' ]: 323 ms
25482 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - Changed Partitions []
25482 [Thread-5] INFO  org.apache.hudi.hive.HoodieHiveClient  - No partitions to change for hive_hudi_sync
25482 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: get_table : db=default tbl=hive_hudi_sync
25482 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=get_table : db=default tbl=hive_hudi_sync
25497 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: alter_table: db=default tbl=hive_hudi_sync newtbl=hive_hudi_sync
25497 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=alter_table: db=default tbl=hive_hudi_sync newtbl=hive_hudi_sync
25560 [Thread-5] INFO  org.apache.hudi.hive.HiveSyncTool  - Sync complete for hive_hudi_sync
25560 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: Shutting down the object store...
25560 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=Shutting down the object store...
25560 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore  - 0: Metastore shutdown complete.
25560 [Thread-5] INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit  - ugi=rakeshramakrishnan   ip=unknown-ip-addr  cmd=Metastore shutdown complete.
25560 [Thread-5] INFO  org.apache.hudi.HoodieSparkSqlWriter$  - Is Async Compaction Enabled ? false
25560 [Thread-5] INFO  org.apache.hudi.client.AbstractHoodieClient  - Stopping Timeline service !!
25560 [Thread-5] INFO  org.apache.hudi.client.embedded.EmbeddedTimelineService  - Closing Timeline server
25560 [Thread-5] INFO  org.apache.hudi.timeline.service.TimelineService  - Closing Timeline Service
25561 [Thread-5] INFO  io.javalin.Javalin  - Stopping Javalin ...
25575 [Thread-5] INFO  io.javalin.Javalin  - Javalin has stopped
25576 [Thread-5] INFO  org.apache.hudi.timeline.service.TimelineService  - Closed Timeline Service
25576 [Thread-5] INFO  org.apache.hudi.client.embedded.EmbeddedTimelineService  - Closed Timeline server
31552 [Thread-5] INFO  org.apache.spark.scheduler.DAGScheduler  - Job 13 finished: hasNext at NativeMethodAccessorImpl.java:0, took 0.021244 s

After [Table(name='****', database='default', description=None, tableType='MANAGED', isTemporary=False), .... tables in hive metastore]

31588 [Thread-1] INFO  org.apache.spark.SparkContext  - Invoking stop() from shutdown hook
31597 [Thread-1] INFO  org.spark_project.jetty.server.AbstractConnector  - Stopped Spark@2239cd56{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
31599 [Thread-1] INFO  org.apache.spark.ui.SparkUI  - Stopped Spark web UI at http://192.168.0.104:4040
31665 [Thread-1] INFO  org.apache.spark.SparkContext  - Successfully stopped SparkContext

######### New spark shell ################
~/OSS/spark/spark-2.4.7-bin-hadoop2.7 34s
.venv ❯ bin/pyspark
Python 3.7.5 (default, Dec 29 2020, 13:08:16)
SparkSession available as 'spark'.
>>> spark.catalog.listTables()
11457 [Thread-3] WARN  org.apache.hadoop.hive.metastore.ObjectStore  - Failed to get database global_temp, returning NoSuchObjectException
[Table(name='hive_hudi_sync', database='default', description=None, tableType='EXTERNAL', isTemporary=False)]
>>>

bvaradar commented 3 years ago

@satishkotha : Can you help with this ?

satishkotha commented 3 years ago

@rakeshramakrishnan From logs, I do see table default.hive_hudi_sync is created correctly and available in catalog

25064 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE IF NOT EXISTS default.hive_hudi_sync( _hoodie_commit_time string, _hoodie_commit_seqno string, _hoodie_record_key string, _hoodie_partition_path string, _hoodie_file_name string, begin_lat double, begin_lon double, driver string, end_lat double, end_lon double, fare double, rider string, ts double, uuid string) PARTITIONED BY (partitionpath string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync']: 317 ms

spark.catalog.listTables() Table(name='hive_hudi_sync'

Why do you think table is not available? I don't see any other errors in logs you shared.

rakeshramakrishnan commented 3 years ago

@satishkotha: I can see the table created in the hive table of the local spark catalog but not in the remote hive metastore. There are no error logs.

rakeshramakrishnan commented 3 years ago

@bvaradar @satishkotha : Will the PR #2449 address this issue? However the PR seems to be for the hive sync standalone tool. Or does the hive sync within hudi write use the same module?

Trevor-zhang commented 3 years ago

@bvaradar @satishkotha : Will the PR #2449 address this issue? However the PR seems to be for the hive sync standalone tool. Or does the hive sync within hudi write use the same module? I'm making adjustments, wait for it to be completely better, you can try.

n3nash commented 3 years ago

@rakeshramakrishnan Could you try the above patch from @Trevor-zhang and see if that fixes your issue ?

rakeshramakrishnan commented 3 years ago

@n3nash The PR #2449 is closed now. Is there any other PR that tracks this issue?

nsivabalan commented 3 years ago

@rakeshramakrishnan : would be nice if you can respond w/ any recent updates.

nsivabalan commented 3 years ago

@rakeshramakrishnan : if I not wrong, hive sync w/ metastore have been working (anecdotally from community. ) w/ hudi. So, may be some jar mismatch issue. Even w/o the aforementioned patch (#2449), it was working before. 2449 just adds explicit configs. Prior to this hive sync uses properties from Hadoop conf. that's the only difference. As satish mentioned, we don't see any errors in the log attached.

Can you get us a full stack trace if possible.

rakeshramakrishnan commented 3 years ago

@nsivabalan : There are no errors, however through hudi, the connection is made to the local hive metastore (from spark). It doesn't connect to the external hive metastore.

But, without hudi, the spark catalog fetches tables hive tables from the external metastore

spark = SparkSession.builder \
    .appName("test-hudi-hive-sync") \
    .enableHiveSupport() \
    .config("hive.metastore.uris", metastore_uri) \
    .getOrCreate()

print("Before {}".format(spark.catalog.listTables())) ------> returns tables from `metastore_uri`

codope commented 3 years ago

@rakeshramakrishnan For hive sync to work inline through Hudi, the hive-site.xml at /conf should also be placed under /conf and it should have the correct metastore uri. Can you check the hive metastore uri in hive-site.xml inside /conf?

I tried to reproduce with a remote MySQL database as metastore. My jdbc specific configs in hive-site.xml look like as follows:

"javax.jdo.option.ConnectionURL": "jdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true",
"javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
"javax.jdo.option.ConnectionUserName": "username",
"javax.jdo.option.ConnectionPassword": "password"

Then the following pyspark script works:

pyspark \
>  --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
>  --conf "spark.sql.hive.convertMetastoreParquet=false" \
>  --jars /home/hadoop/hudi-spark3-bundle_2.12-0.10.0-SNAPSHOT.jar,/usr/lib/spark/external/lib/spark-avro.jar
...
...
Using Python version 3.7.9 (default, Aug 27 2020 21:59:41)
SparkSession available as 'spark'.
>>> from pyspark.sql import functions as F
>>>
>>> inputDF = spark.createDataFrame([
...  ("100", "2015/01/01", "2015-01-01T13:51:39.340396Z"),
...  ("101", "2015/01/01", "2015-01-01T12:14:58.597216Z"),
...  ("102", "2015/01/01", "2015-01-01T13:51:40.417052Z"),
...  ("103", "2015/01/01", "2015-01-01T13:51:40.519832Z"),
...  ("104", "2015/01/02", "2015-01-01T12:15:00.512679Z"),
...  ("105", "2015/01/02", "2015-01-01T13:51:42.248818Z")],
...  ["id", "creation_date", "last_update_time"])
>>>
>>> hudiOptions = {
...   "hoodie.table.name" : "hudi_hive_table",
...   "hoodie.datasource.write.table.type" : "COPY_ON_WRITE",
...   "hoodie.datasource.write.operation" : "insert",
...   "hoodie.datasource.write.recordkey.field" : "id",
...   "hoodie.datasource.write.partitionpath.field" : "creation_date",
...   "hoodie.datasource.write.precombine.field" : "last_update_time",
...   "hoodie.datasource.hive_sync.enable" : "true",
...   "hoodie.datasource.hive_sync.table" : "hudi_hive_table",
...   "hoodie.datasource.hive_sync.partition_fields" : "creation_date"
... }
>>>
>>> inputDF.write.format("org.apache.hudi").options(**hudiOptions).mode("overwrite").save("s3://huditestbkt/hive_sync/")
21/09/29 10:22:08 WARN HoodieSparkSqlWriter$: hoodie table at s3://huditestbkt/hive_sync already exists. Deleting existing data & overwriting with new data.
21/09/29 10:22:34 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
>>>

matthiasdg commented 2 years ago

Had the same issue, using Scala, Spark and DataSourceWriteOptions.HIVE_SYNC_MODE.key() -> "hms". Adding a hive-site.xml with the URL to a src/main/resources folder fixed it for me. If this is intended, maybe should be added to the documentation? Feels a bit weird that you specify a URL DataSourceWriteOptions.HIVE_URL but it has no effect?

rubenssoto commented 2 years ago

@matthiasdg Could help me?

I dont understand your solution src/main/resources

I dont regonize this path

matthiasdg commented 2 years ago

@rubenssoto You have to make sure the hive-site.xml can be found on the classpath. For java, scala projects you typically use resources folders for that. Not sure what/how your project is...

rubenssoto commented 2 years ago

@matthiasdg

It is a python project and my Hive site is inside spark classpath

But I keep receiving this error from hudi:

Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
    at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)

But this table exist on my metastore database.

nsivabalan commented 2 years ago

@rubenssoto : can you confirm that all connection configs are intact in your set up.

these are the ones that worked for sagar.

"javax.jdo.option.ConnectionURL": "jdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true", "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver", "javax.jdo.option.ConnectionUserName": "username", "javax.jdo.option.ConnectionPassword": "password"

alternatively you can also try "hms" mode instead of jdbc. I will let @codope follow up from here.

nsivabalan commented 2 years ago

Will go ahead and close this one out as we have a solution proposed. Feel free to re-open if you are still encountering issues.

apache / hudi

[SUPPORT] Unable to sync with external hive metastore via metastore uris in the thrift protocol #2439