apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.36k stars 2.42k forks source link

Hudi metadata to hive metastore Get the parquet schema for this table looking at the latest commit OOM #9180

Open SendDreams opened 1 year ago

SendDreams commented 1 year ago

Trying to sync hoodie table acs_camain_huo with base path hdfs://GWNS:8020/user/hudi/acs_camain_hu of type MERGE_ON_READ [INFO ] 2023-07-13 11:09:14,065 method:org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:599) Connected to metastore. [INFO ] 2023-07-13 11:09:14,065 method:org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:156) Syncing target hoodie table with hive table(ods.acs_wffins_hu). Hive metastore URL from HiveConf:thrift://bdp-utility-p-0001:9083). Hive metastore URL from HiveSyncConfig:thrift://bdp-utility-p-0001:9083, basePath :hdfs://GWNS:8020/user/hudi/acs_wffins_hu [INFO ] 2023-07-13 11:09:14,066 method:org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:214) Trying to sync hoodie table acs_wffins_huo with base path hdfs://GWNS:8020/user/hudi/acs_wffins_hu of type MERGE_ON_READ java.lang.OutOfMemoryError: Java heap space Dumping heap to /tmp/benpin_report_hu_real_job_02_oom ... Unable to create /tmp/benpin_report_hu_real_job_02_oom: File exists [INFO ] 2023-07-13 11:09:15,248 method:org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:628) Closed a connection to metastore, current connections: 2 [ERROR] 2023-07-13 11:09:15,249 method:org.apache.hudi.sink.utils.NonThrownExecutor.handleException(NonThrownExecutor.java:140) Executor executes action [sync hive metadata for instant 20230713110913478] error java.lang.OutOfMemoryError: Java heap space at org.apache.hudi.common.table.log.block.HoodieLogBlock.tryReadContent(HoodieLogBlock.java:267) at org.apache.hudi.common.table.log.HoodieLogFileReader.readBlock(HoodieLogFileReader.java:194) at org.apache.hudi.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:411) at org.apache.hudi.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:70) at org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:375) at org.apache.hudi.common.table.TableSchemaResolver.readSchemaFromLogFile(TableSchemaResolver.java:363) at org.apache.hudi.common.table.TableSchemaResolver.fetchSchemaFromFiles(TableSchemaResolver.java:511) at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:272) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:116) at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:444) at org.apache.hudi.common.table.TableSchemaResolver$$Lambda$1800/1082959049.get(Unknown Source) at org.apache.hudi.util.Lazy.get(Lazy.java:54) at org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromLatestCommitMetadata(TableSchemaResolver.java:229) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaInternal(TableSchemaResolver.java:197) at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:137) at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchema(TableSchemaResolver.java:177) at org.apache.hudi.sync.common.HoodieSyncClient.getStorageSchema(HoodieSyncClient.java:110) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:238) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:188) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:162) at org.apache.hudi.sink.StreamWriteOperatorCoordinator.doSyncHive(StreamWriteOperatorCoordinator.java:340) at org.apache.hudi.sink.StreamWriteOperatorCoordinator$$Lambda$1789/2130655350.run(Unknown Source) at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) at org.apache.hudi.sink.utils.NonThrownExecutor$$Lambda$1443/2107592834.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ??????:{ "at": { "atMobiles":[ "null" ], "isAtAll": true }, "text": { "content":"flink??[benpin_report_hu_real_job_02]???????" }, "msgtype":"text"} {"errcode":300001,"errmsg":"????: robot ????????:??? token ?????"} [INFO ] 2023-07-13 11:09:15,805 method:org.apache.hudi.hive.ddl.HMSDDLExecutor.getTableSchema(HMSDDLExecutor.java:191) Time taken to getTableSchema: 10 ms [INFO ] 2023-07-13 11:09:15,802 method:org.apache.hudi.hive.ddl.HMSDDLExecutor.getTableSchema(HMSDDLExecutor.java:191)

SendDreams commented 1 year ago

Is there any god can help me anwser zhe question?

ad1happy2go commented 1 year ago

@SendDreams Are you running hive sync job? This is failing with Java Heap Space issue. You might consider trying increasing the memory.

ad1happy2go commented 1 year ago

@SendDreams Were you able to get it resolved?