Closed wosow closed 3 years ago
It looks like the error is happening during loading the data at hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/837b6714-40b3-4a00-bcf5-97a6f33d2af7.parquet
Can you check if hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/ is a hudi table. Do you see hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/.hoodie folder ?
Can you list the entire folder hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125 and attach ?
the entire folder (hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125) as follows
Permission | Owner | Group | Size | Last Modified | Replication | Block Size | Name |
---|---|---|---|---|---|---|---|
drwxr-xr-x | root | supergroup | 0 B | 2020/11/25 下午4:18:26 | 0 | 0 B | .metadata |
drwxr-xr-x | root | supergroup | 0 B | 2020/11/25 下午4:19:01 | 0 | 0 B | .signals |
-rw-r--r-- | root | supergroup | 10.27 MB | 2020/11/25 下午4:19:00 | 1 | 128 MB | 231939a9-ebe4-4a2b-9338-badf75ee9f49.parquet |
the question is that when i using 0.5.3 it is ok , 0.6.0 is not work
hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125 is the destination of sqoop import not the hudi table directory
@wosow : If this is a plain parquet dataset, you should be reading like spark.read.parquet("hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/*") and not use hudi format.
@wosow : If this is a plain parquet dataset, you should be reading like spark.read.parquet("hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/*") and not use hudi format.
thank you ,i will try
@wosow : Please reopen if you are still stuck.
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
An error occurred when I used Hudi-0.6.0 to integrate Spark-2.4.4 to write data to Hudi and synchronize Hive, as follows
20/11/26 14:22:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json. 20/11/26 14:22:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@bd93bc3{/SQL/execution/json,null,AVAILABLE,@Spark} 20/11/26 14:22:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql. 20/11/26 14:22:51 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e67cfe1{/static/sql,null,AVAILABLE,@Spark} 20/11/26 14:22:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.228.86.12:42864) with ID 3 20/11/26 14:22:52 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 20/11/26 14:22:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager lake03:40372 with 8.4 GB RAM, BlockManagerId(3, lake03, 40372, None) 20/11/26 14:22:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://nameservice], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, spark_hadoop_conf.xml, file:/opt/modules/spark-2.4.4/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1481461246_1, ugi=root (auth:SIMPLE)]]] 20/11/26 14:22:52 INFO hudi.DataSourceUtils: Getting table path.. 20/11/26 14:22:52 INFO util.TablePathUtils: Getting table path from path : hdfs://nameservice/data/wdt/sqoop/cow/inc/stockout_order_20201125/837b6714-40b3-4a00-bcf5-97a6f33d2af7.parquet Exception in thread "main" org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths. at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:120) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:72) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at com.ws.hudi.wdt.cow.StockOutOrder$.stockOutOrderIncUpdate(StockOutOrder.scala:104) at com.ws.hudi.wdt.cow.StockOutOrder$.main(StockOutOrder.scala:41) at com.ws.hudi.wdt.cow.StockOutOrder.main(StockOutOrder.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 20/11/26 14:22:52 INFO spark.SparkContext: Invoking stop() from shutdown hook 20/11/26 14:22:52 INFO server.AbstractConnector: Stopped Spark@76b224cd{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 20/11/26 14:22:52 INFO ui.SparkUI: Stopped Spark web UI at http://lake03:4040 20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 20/11/26 14:22:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 20/11/26 14:22:52 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 20/11/26 14:22:52 INFO cluster.YarnClientSchedulerBackend: Stopped 20/11/26 14:22:55 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/11/26 14:22:55 INFO memory.MemoryStore: MemoryStore cleared 20/11/26 14:22:55 INFO storage.BlockManager: BlockManager stopped 20/11/26 14:22:55 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 20/11/26 14:22:55 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/11/26 14:22:55 INFO spark.SparkContext: Successfully stopped SparkContext 20/11/26 14:22:55 INFO util.ShutdownHookManager: Shutdown hook called
Environment Description