Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.parquet.column.values.dictionary.IntList.initSlab(IntList.java:90) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.values.dictionary.IntList.<init>(IntList.java:86) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter.<init>(DictionaryValuesWriter.java:93) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.<init>(DictionaryValuesWriter.java:229) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.ParquetProperties.dictionaryWriter(ParquetProperties.java:131) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.ParquetProperties.dictWriterWithFallBack(ParquetProperties.java:178) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.ParquetProperties.getValuesWriter(ParquetProperties.java:203) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.impl.ColumnWriterV1.<init>(ColumnWriterV1.java:83) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.impl.ColumnWriteStoreV1.newMemColumn(ColumnWriteStoreV1.java:68) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.column.impl.ColumnWriteStoreV1.getColumnWriter(ColumnWriteStoreV1.java:56) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.<init>(MessageColumnIO.java:184) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.io.MessageColumnIO.getRecordWriter(MessageColumnIO.java:376) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:109) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:99) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:272) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:217) ~[bigdata-hudi-dataware-CK.jar:?]
at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.<init>(HoodieRowDataParquetWriter.java:44) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.hudi.io.storage.row.HoodieRowDataFileWriterFactory.newParquetInternalRowFileWriter(HoodieRowDataFileWriterFactory.java:77) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.hudi.io.storage.row.HoodieRowDataFileWriterFactory.getRowDataFileWriter(HoodieRowDataFileWriterFactory.java:54) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.hudi.io.storage.row.HoodieRowDataCreateHandle.createNewFileWriter(HoodieRowDataCreateHandle.java:203) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.hudi.io.storage.row.HoodieRowDataCreateHandle.<init>(HoodieRowDataCreateHandle.java:100) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.hudi.sink.bucket.BucketBulkInsertWriterHelper.getRowCreateHandle(BucketBulkInsertWriterHelper.java:67) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.hudi.sink.bucket.BucketBulkInsertWriterHelper.write(BucketBulkInsertWriterHelper.java:58) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.hudi.sink.bulk.BulkInsertWriteFunction.processElement(BulkInsertWriteFunction.java:124) ~[hudi-flink-bundle_2.12-0.11.0.jar:0.11.0-SNAPSHOT]
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$637/1013620024.runDefaultAction(Unknown Source) ~[?:?]
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
Describe the problem you faced
OOM occurred when use bulk_insert cow table with flink BUCKET index
To Reproduce
Steps to reproduce the behavior:
Expected behavior
data source from hive, sink to hudi with flink
Environment Description
Hudi version : master 0.11.0
Flink version : 1.14.3
Hadoop version : 3.0.0
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no
Additional context
hudi config
core & heap size total : 6c 36G
HDFS flie
Stacktrace