Open BalaMahesh opened 1 year ago
Can you please try upgrading to the latest version of Trino? HBase was not included as compile-time dependency previously. If upgrading to Trino is not an immediate option for you, then you will have place hbase-common jars manually under <TRINO_HOME>plugin/hudi/
.
hey @codope , I had to add bunch of jars along with hbase-common to get the query running after enabling hudi.metadata_enabled=true.
What I have observed is that the query is taking longer time with this property set than the one without using metadata_enabled.
Also I have observed that hudi is creating bunch of log files in metadata folder even after I have set the configurations
hoodie.metadata.enable=true hoodie.write.concurrency.mode=optimistic_concurrency_control hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider hoodie.metadata.compact.max.delta.commits=1
and also it seems we don't have control over
hoodie.metadata.cleaner.commits.retained - to keep the number of log and base files on metadata folder.
With these many number of files, when we run the query, Trino is trying to run compaction on these log and base files in metadata folder and then reading the parquet files which is adding some delay and latency in query run times.
Is this behaviour expected ? We are assuming that queries will be faster with metadata enabled but what we are seeing is reverse.
There are 640 files lying on the metadata folder.
2023-09-26T11:02:12.915+0530 INFO Query-20230925_142458_00023_we5zs-956 org.apache.hudi.common.table.HoodieTableMetaClient Loading HoodieTableMetaClient from gs://bucket/hudi/db/table
2023-09-26T11:02:13.047+0530 INFO Query-20230925_142458_00023_we5zs-956 org.apache.hudi.common.table.HoodieTableConfig Loading table properties from gs://bucket/hudi/db/table/.hoodie/hoodie.properties
2023-09-26T11:02:13.482+0530 INFO Query-20230925_142458_00023_we5zs-956 org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from gs://bucket/hudi/db/table
2023-09-26T11:02:13.788+0530 INFO Query-20230925_142458_00023_we5zs-956 org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:02:13.788+0530 INFO Query-20230925_142458_00023_we5zs-956 org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based view for basePath gs://bucket/hudi/db/table
2023-09-26T11:02:13.789+0530 INFO Query-20230925_142458_00023_we5zs-956 org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 1 ms to read 0 instants, 0 replaced file groups
2023-09-26T11:02:13.794+0530 INFO Query-20230925_142458_00023_we5zs-956 org.apache.hudi.common.util.ClusteringUtils Found 0 files in pending clustering operations
2023-09-26T11:02:13.796+0530 INFO hudi-split-manager-22 org.apache.hudi.common.table.view.AbstractTableFileSystemView Building file system view for partition ()
2023-09-26T11:02:13.938+0530 INFO hudi-split-manager-22 org.apache.hudi.common.table.view.AbstractTableFileSystemView addFilesToView: NumFiles=12, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
2023-09-26T11:02:15.720+0530 INFO dispatcher-query-45 io.trino.event.QueryMonitor TIMELINE: Query 20230925_142458_00023_we5zs :: FINISHED :: elapsed 6814ms :: planning 1191ms :: waiting 2037ms :: scheduling 3855ms :: running 1621ms :: finishing 147ms :: begin 2023-09-26T11:02:08.906+05:30 :: end 2023-09-26T11:02:15.720+05:30
query logs with hudi.metadata_enabled=true.
2023-09-26T11:04:12.748+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableMetaClient Loading HoodieTableMetaClient from gs://bucket/hudi/db/table
2023-09-26T11:04:12.927+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableConfig Loading table properties from gs://bucket/hudi/db/table/.hoodie/hoodie.properties
2023-09-26T11:04:13.379+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from gs://bucket/hudi/db/table
2023-09-26T11:04:13.626+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:13.627+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based view for basePath gs://bucket/hudi/db/table
2023-09-26T11:04:13.627+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 0 ms to read 0 instants, 0 replaced file groups
2023-09-26T11:04:13.628+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.util.ClusteringUtils Found 0 files in pending clustering operations
2023-09-26T11:04:13.629+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableMetaClient Loading HoodieTableMetaClient from gs://bucket/hudi/db/table
2023-09-26T11:04:13.814+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableConfig Loading table properties from gs://bucket/hudi/db/table/.hoodie/hoodie.properties
2023-09-26T11:04:14.273+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from gs://bucket/hudi/db/table
2023-09-26T11:04:14.273+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableMetaClient Loading HoodieTableMetaClient from gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:14.499+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableConfig Loading table properties from gs://bucket/hudi/db/table/.hoodie/metadata/.hoodie/hoodie.properties
2023-09-26T11:04:14.969+0530 INFO Query-20230925_142658_00025_we5zs-1005 org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:14.974+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.view.AbstractTableFileSystemView Building file system view for partition ()
2023-09-26T11:04:14.976+0530 INFO hudi-split-manager-25 org.apache.hudi.metadata.HoodieTableMetadataUtil Loading latest merged file slices for metadata table partition files
2023-09-26T11:04:15.451+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:15.451+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 0 ms to read 0 instants, 0 replaced file groups
2023-09-26T11:04:15.599+0530 INFO hudi-split-manager-25 org.apache.hudi.common.util.ClusteringUtils Found 0 files in pending clustering operations
2023-09-26T11:04:15.600+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.view.AbstractTableFileSystemView Building file system view for partition (files)
2023-09-26T11:04:16.163+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.view.AbstractTableFileSystemView addFilesToView: NumFiles=640, NumFileGroups=1, FileGroupsCreationTime=94, StoreTimeTaken=1
2023-09-26T11:04:16.808+0530 INFO hudi-split-manager-25 org.apache.hudi.metadata.HoodieBackedTableMetadata Opened metadata base file from gs://bucket/hudi/db/table/.hoodie/metadata/files/files-0000_0-14014-15896_20230925214227853001.hfile at instant 20230925214227853001 in 638 ms
2023-09-26T11:04:17.016+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:17.023+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.HoodieTableMetaClient Loading HoodieTableMetaClient from gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:17.288+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.HoodieTableConfig Loading table properties from gs://bucket/hudi/db/table/.hoodie/metadata/.hoodie/hoodie.properties
2023-09-26T11:04:17.744+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:17.929+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:18.399+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Scanning log file HoodieLogFile{pathStr='gs://bucket/hudi/db/table/.hoodie/metadata/files/.files-0000_20230925214227853001.log.1_0-14021-15903', fileLen=-1}
2023-09-26T11:04:18.400+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Reading a data block from file gs://bucket/hudi/db/table/.hoodie/metadata/files/.files-0000_20230925214227853001.log.1_0-14021-15903 at instant 20230925214447606
2023-09-26T11:04:18.400+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Merging the final data blocks
2023-09-26T11:04:18.400+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Number of remaining logblocks to merge 1
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.common.util.collection.ExternalSpillableMap Estimated Payload size => 504
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner Number of log files scanned => 1
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner MaxMemoryInBytes allowed for compaction => 1073741824
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner Number of entries in MemoryBasedMap in ExternalSpillableMap => 2
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner Total size in bytes of MemoryBasedMap in ExternalSpillableMap => 1008
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner Number of entries in BitCaskDiskMap in ExternalSpillableMap => 0
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner Size of file spilled to disk => 0
2023-09-26T11:04:18.421+0530 INFO hudi-split-manager-25 org.apache.hudi.metadata.HoodieBackedTableMetadata Opened 1 metadata log files (dataset instant=20230925214447606, metadata instant=20230925214447606) in 1609 ms
2023-09-26T11:04:18.428+0530 INFO hudi-split-manager-25 org.apache.hudi.metadata.BaseTableMetadata Listed file in partition from metadata: partition=, #files=10
2023-09-26T11:04:18.434+0530 INFO hudi-split-manager-25 org.apache.hudi.common.table.view.AbstractTableFileSystemView addFilesToView: NumFiles=10, NumFileGroups=1, FileGroupsCreationTime=6, StoreTimeTaken=0
2023-09-26T11:04:19.704+0530 INFO dispatcher-query-51 io.trino.event.QueryMonitor TIMELINE: Query 20230925_142658_00025_we5zs :: FINISHED :: elapsed 11021ms :: planning 1219ms :: waiting 2086ms :: scheduling 8620ms :: running 1020ms :: finishing 162ms :: begin 2023-09-26T11:04:08.681+05:30 :: end 2023-09-26T11:04:19.702+05:30
@codope - any fix for this behaviour ? we started seeing OOM errors on the executor pods because of these large number of metadata files.
Id='HoodieFileGroupId{partitionPath='files', fileId='files-0000'}', metrics={TOTAL_LOG_FILES=671.0, TOTAL_IO_READ_MB=7.0, TOTAL_LOG_FILES_SIZE=7456183.0, TOTAL_IO_WRITE_MB=0.0, TOTAL_IO_MB=7.0}, bootstrapFilePath=Optional.empty}] files
23/09/28 03:55:29 INFO SparkContext: Starting job: collect at HoodieJavaRDD.java:163
23/09/28 03:55:29 INFO DAGScheduler: Got job 30 (collect at HoodieJavaRDD.java:163) with 1 output partitions
23/09/28 03:55:29 INFO DAGScheduler: Final stage: ResultStage 54 (collect at HoodieJavaRDD.java:163)
23/09/28 03:55:29 INFO DAGScheduler: Parents of final stage: List()
23/09/28 03:55:29 INFO DAGScheduler: Missing parents: List()
23/09/28 03:55:29 INFO DAGScheduler: Submitting ResultStage 54 (MapPartitionsRDD[108] at map at HoodieJavaRDD.java:111), which has no missing parents
23/09/28 03:55:29 INFO MemoryStore: Block broadcast_40 stored as values in memory (estimated size 384.6 KiB, free 1048.4 MiB)
23/09/28 03:55:29 INFO MemoryStore: Block broadcast_40_piece0 stored as bytes in memory (estimated size 130.1 KiB, free 1048.3 MiB)
23/09/28 03:55:29 INFO BlockManagerInfo: Added broadcast_40_piece0 in memory on glance-grap-studio-bubble-384cd48ad9e9f6d8-driver-svc.spark-jobs.svc:7079 (size: 130.1 KiB, free: 1048.7 MiB)
23/09/28 03:55:29 INFO SparkContext: Created broadcast 40 from broadcast at DAGScheduler.scala:1478
23/09/28 03:55:29 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 54 (MapPartitionsRDD[108] at map at HoodieJavaRDD.java:111) (first 15 tasks are for partitions Vector(0))
23/09/28 03:55:29 INFO TaskSchedulerImpl: Adding task set 54.0 with 1 tasks resource profile 0
23/09/28 03:55:29 INFO TaskSetManager: Starting task 0.0 in stage 54.0 (TID 226) (10.207.40.187, executor 1, partition 0, PROCESS_LOCAL, 42753 bytes) taskResourceAssignments Map()
23/09/28 03:55:29 INFO BlockManagerInfo: Added broadcast_40_piece0 in memory on 10.207.40.187:34929 (size: 130.1 KiB, free: 7.6 GiB)
23/09/28 03:55:34 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:55:44 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:55:54 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:56:04 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:56:14 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:56:24 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:56:34 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:56:44 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:56:54 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:57:04 INFO HoodieAsyncService: Waiting for next instant up to 10 seconds
23/09/28 03:57:04 WARN TaskSetManager: Lost task 0.0 in stage 54.0 (TID 226) (10.207.40.187 executor 1): java.lang.OutOfMemoryError: Java heap space
at java.base/java.io.BufferedInputStream.<init>(Unknown Source)
at org.apache.hadoop.fs.BufferedFSInputStream.<init>(BufferedFSInputStream.java:56)
at org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStreamForGCS(HoodieLogFileReader.java:541)
at org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:502)
at org.apache.hudi.common.table.log.HoodieLogFileReader.<init>(HoodieLogFileReader.java:118)
at org.apache.hudi.common.table.log.HoodieLogFormatReader.hasNext(HoodieLogFormatReader.java:110)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:247)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:223)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:198)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:114)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:73)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:464)
at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:204)
at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$9cd4b1be$1(HoodieCompactor.java:129)
at org.apache.hudi.table.action.compact.HoodieCompactor$$Lambda$2356/0x0000000841269c40.apply(Unknown Source)
at org.apache.hudi.data.HoodieJavaRDD$$Lambda$1876/0x0000000840ee1040.call(Unknown Source)
at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at org.apache.spark.api.java.JavaPairRDD$$$Lambda$655/0x000000084060e840.apply(Unknown Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
at org.apache.spark.storage.BlockManager$$Lambda$633/0x00000008405ad840.apply(Unknown Source)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
23/09/28 03:57:04 INFO TaskSetManager: Starting task 0.1 in stage 54.0 (TID 227) (10.207.40.187, executor 1, partition 0```
Do you see any metadata compaction in the metadata timeline (under <table_base_path>/.hoodie/metadata/.hoodie
)? It appears as if metadata compaction has not happened and there isn't enough memory to merge the metadata records. We can increase the memory fraction.
@BalaMahesh Any update on this? Did above info helped?
@ad1happy2go / @codope -
hoodie.metadata.compact.max.delta.commits=1
with this config I expect compaction to run after every metadata commit, but that is not happening and all the delta commits are being piled up on metadata path and after accumulating large number of files, compaction is being triggered and failing with OOM error.
I didn't get enough time to go through the code flow for this metadata table.
We can increase the memory fraction - if this the problem it should at least trigger the compaction after every delta commit and fail right with hoodie.metadata.compact.max.delta.commits set to 1.
hoodie.clean.async=false
after setting this to false, compaction is being triggered for the metadata table, earlier there are pendingInstants of delta commits always because async clean kicks in and creates a new delta commit on metadata table timeline which is latest than the one on data timeline.
List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
.findInstantsBeforeOrEquals(latestDeltaCommitTimeInMetadataTable).getInstants();
if (!pendingInstants.isEmpty()) {
LOG.info(String.format(
"Cannot compact metadata table as there are %d inflight instants in data table before latest deltacommit in metadata table: %s. Inflight instants in data table: %s",
pendingInstants.size(), latestDeltaCommitTimeInMetadataTable, Arrays.toString(pendingInstants.toArray())));
return;
}
This piece of code in HoodieBackedTableMetadataWriter
,
if (lastCompletedCompactionInstant.isPresent()
&& metadataMetaClient.getActiveTimeline().filterCompletedInstants()
.findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants() < 3) {
To clean the metadata files , we should always keep:
hoodie.metadata.compact.max.delta.commits=3
else files will be never cleaned and piled up.
After making these changes in config, I can see cleaner metadata files partition, earlier it was piled up with old files.
@danny0405 I see that the the limit was hard-coded to 3 in https://github.com/apache/hudi/commit/f1286c2c764d6be9f23b41c76f4de1c8734c1f3b. Should we determine this based on compaction max delta commits?
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
We have been using hudi 0.12.0 version and observed that our GCS listing and usage metrics have went high. When we brought this up in slack community, they have suggested to update to 0.13.1, we have updated our apps to 0.13.1 version and able to query the tables using hudi catalog from Trino. But when I tried setting the below property to leverage metadata to improve the query performance, we are unable to run the queries and seeing the errors.
hudi.metadata_enabled=true;
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Trino leverage the metadata and run the queries faster.
Environment Description
Hudi version : 0.13.1
Spark version : 3.2.2
Hive version : 2.3.5
Hadoop version : 3.2
Storage (HDFS/S3/GCS..) : GCS
Trino Version : 402 (hudi-trino-bundle-0.12.1.jar placed in hudi plugin)
Running on Docker? (yes/no) : yes
Additional context
Add any other context about the problem here.
Stacktrace