apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.39k stars 2.42k forks source link

[SUPPORT] Hoodie table not found in path Unable to find a hudi table for the user provided paths. #2282

Closed wosow closed 3 years ago

wosow commented 3 years ago

Tips before filing an issue

An error occurred when I used Hudi-0.6.0 to integrate Spark-2.4.4 to write data to Hudi and synchronize Hive, as follows

20/11/26 14:22:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json. 20/11/26 14:22:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bd93bc3{/SQL/execution/json,null,AVAILABLE,@Spark} 20/11/26 14:22:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql. 20/11/26 14:22:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e67cfe1{/static/sql,null,AVAILABLE,@Spark} 20/11/26 14:22:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.228.86.12:42864) with ID 3 20/11/26 14:22:52 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 20/11/26 14:22:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager lake03:40372 with 8.4 GB RAM, BlockManagerId(3, lake03, 40372, None) 20/11/26 14:22:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://nameservice], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, spark_hadoop_conf.xml, file:/opt/modules/spark-2.4.4/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1481461246_1, ugi=root (auth:SIMPLE)]]] 20/11/26 14:22:52 INFO hudi.DataSourceUtils: Getting table path.. 20/11/26 14:22:52 INFO util.TablePathUtils: Getting table path from path : hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/837b6714-40b3-4a00-bcf5-97a6f33d2af7.parquet Exception in thread "main" org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths. at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:120) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:72) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at com.ws.hudi.wdt.cow.StockOutOrder$.stockOutOrderIncUpdate(StockOutOrder.scala:104) at com.ws.hudi.wdt.cow.StockOutOrder$.main(StockOutOrder.scala:41) at com.ws.hudi.wdt.cow.StockOutOrder.main(StockOutOrder.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 20/11/26 14:22:52 INFO spark.SparkContext: Invoking stop() from shutdown hook 20/11/26 14:22:52 INFO server.AbstractConnector: Stopped Spark@76b224cd{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 20/11/26 14:22:52 INFO ui.SparkUI: Stopped Spark web UI at http://lake03:4040 20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 20/11/26 14:22:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 20/11/26 14:22:52 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Stopped 20/11/26 14:22:55 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/11/26 14:22:55 INFO memory.MemoryStore: MemoryStore cleared 20/11/26 14:22:55 INFO storage.BlockManager: BlockManager stopped 20/11/26 14:22:55 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 20/11/26 14:22:55 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/11/26 14:22:55 INFO spark.SparkContext: Successfully stopped SparkContext 20/11/26 14:22:55 INFO util.ShutdownHookManager: Shutdown hook called


Environment Description

bvaradar commented 3 years ago

It looks like the error is happening during loading the data at hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/837b6714-40b3-4a00-bcf5-97a6f33d2af7.parquet

Can you check if hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/ is a hudi table. Do you see hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/.hoodie folder ?

Can you list the entire folder hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125 and attach ?

wosow commented 3 years ago

the entire folder (hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125) as follows

Permission Owner Group Size Last Modified Replication Block Size Name
drwxr-xr-x root supergroup 0 B 2020/11/25 下午4:18:26 0 0 B .metadata
drwxr-xr-x root supergroup 0 B 2020/11/25 下午4:19:01 0 0 B .signals
-rw-r--r-- root supergroup 10.27 MB 2020/11/25 下午4:19:00 1 128 MB 231939a9-ebe4-4a2b-9338-badf75ee9f49.parquet

the question is that when i using 0.5.3 it is ok , 0.6.0 is not work

hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125 is the destination of sqoop import not the hudi table directory

bvaradar commented 3 years ago

@wosow : If this is a plain parquet dataset, you should be reading like spark.read.parquet("hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/*") and not use hudi format.

wosow commented 3 years ago

@wosow : If this is a plain parquet dataset, you should be reading like spark.read.parquet("hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/*") and not use hudi format.

thank you ,i will try

bvaradar commented 3 years ago

@wosow : Please reopen if you are still stuck.