Closed chenwyi2 closed 1 month ago
Have you tried with more executor memory? I know the ORC Writer currently doesn't roll over so you could just be buffering a very large spark task before writing.
If the issue is solved with more executor memory, you could either keep that setting or attempt to break the write tasks into smaller chunks.
when i gave 10G to executor memory, it will be ok. Comapred with parquet, orc will use more executor memory, Will we support ORC Writer roll over in order to decrease memory consumption?
Pull requests are welcome :) If you would like to add in an ORC Rolling File Writer I think that would be appreciated. I don't know of many ORC users so I think that's why we haven't seen it added yet.
it seems like https://github.com/apache/iceberg/pull/3784/ has already add rolling writer in ORC, maybe i should upgrage iceberg version, thanks
but when i use flink to write with orc, it still error with Java heap space: cause: java.lang.OutOfMemoryError: Java heap space at org.apache.iceberg.shaded.org.apache.orc.storage.ql.exec.vector.LongColumnVector.ensureSize(LongColumnVector.java:314) at org.apache.iceberg.shaded.org.apache.orc.storage.ql.exec.vector.StructColumnVector.ensureSize(StructColumnVector.java:136) at org.apache.iceberg.flink.data.FlinkOrcWriters.growColumnVector(FlinkOrcWriters.java:314) at org.apache.iceberg.flink.data.FlinkOrcWriters.access$500(FlinkOrcWriters.java:47) at org.apache.iceberg.flink.data.FlinkOrcWriters$ListWriter.nonNullWrite(FlinkOrcWriters.java:234) at org.apache.iceberg.flink.data.FlinkOrcWriters$ListWriter.nonNullWrite(FlinkOrcWriters.java:217) at org.apache.iceberg.orc.OrcValueWriter.write(OrcValueWriter.java:41) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.write(GenericOrcWriters.java:509) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.nonNullWrite(GenericOrcWriters.java:495) at org.apache.iceberg.orc.OrcValueWriter.write(OrcValueWriter.java:41) at org.apache.iceberg.flink.data.FlinkOrcWriters$ListWriter.nonNullWrite(FlinkOrcWriters.java:238) at org.apache.iceberg.flink.data.FlinkOrcWriters$ListWriter.nonNullWrite(FlinkOrcWriters.java:217) at org.apache.iceberg.orc.OrcValueWriter.write(OrcValueWriter.java:41) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.write(GenericOrcWriters.java:509) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.writeRow(GenericOrcWriters.java:502) at org.apache.iceberg.flink.data.FlinkOrcWriter.write(FlinkOrcWriter.java:54) at org.apache.iceberg.flink.data.FlinkOrcWriter.write(FlinkOrcWriter.java:38) at org.apache.iceberg.orc.OrcFileAppender.add(OrcFileAppender.java:96) at org.apache.iceberg.io.DataWriter.write(DataWriter.java:71) at org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:362) at org.apache.iceberg.io.BaseTaskWriter$RollingFileWriter.write(BaseTaskWriter.java:345) at org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.write(BaseTaskWriter.java:277) at org.apache.iceberg.io.PartitionedFanoutWriter.write(PartitionedFanoutWriter.java:68) at org.apache.iceberg.flink.sink.IcebergStreamWriter.processElement(IcebergStreamWriter.java:97) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29) at StreamExecCalc$208.processElement_split12(Unknown Source) at StreamExecCalc$208.processElement(Unknown Source) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
Apache Iceberg version
1.2.1
Query engine
Spark
Please describe the bug 🐞
spark use 3.1 version when i used spark to write iceberg with orc format, it error with Java heap space, detailed information is below java.lang.OutOfMemoryError: Java heap space at org.apache.iceberg.shaded.org.apache.orc.storage.ql.exec.vector.LongColumnVector.ensureSize(LongColumnVector.java:314) at org.apache.iceberg.shaded.org.apache.orc.storage.ql.exec.vector.StructColumnVector.ensureSize(StructColumnVector.java:136) at org.apache.iceberg.spark.data.SparkOrcValueWriters.growColumnVector(SparkOrcValueWriters.java:198) at org.apache.iceberg.spark.data.SparkOrcValueWriters.access$300(SparkOrcValueWriters.java:39) at org.apache.iceberg.spark.data.SparkOrcValueWriters$ListWriter.nonNullWrite(SparkOrcValueWriters.java:137) at org.apache.iceberg.spark.data.SparkOrcValueWriters$ListWriter.nonNullWrite(SparkOrcValueWriters.java:116) at org.apache.iceberg.orc.OrcValueWriter.write(OrcValueWriter.java:42) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.write(GenericOrcWriters.java:483) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.nonNullWrite(GenericOrcWriters.java:469) at org.apache.iceberg.orc.OrcValueWriter.write(OrcValueWriter.java:42) at org.apache.iceberg.spark.data.SparkOrcValueWriters$ListWriter.nonNullWrite(SparkOrcValueWriters.java:140) at org.apache.iceberg.spark.data.SparkOrcValueWriters$ListWriter.nonNullWrite(SparkOrcValueWriters.java:116) at org.apache.iceberg.orc.OrcValueWriter.write(OrcValueWriter.java:42) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.write(GenericOrcWriters.java:483) at org.apache.iceberg.data.orc.GenericOrcWriters$StructWriter.writeRow(GenericOrcWriters.java:476) at org.apache.iceberg.spark.data.SparkOrcWriter.write(SparkOrcWriter.java:60) at org.apache.iceberg.spark.data.SparkOrcWriter.write(SparkOrcWriter.java:46) at org.apache.iceberg.orc.OrcFileAppender.add(OrcFileAppender.java:83) at org.apache.iceberg.io.DataWriter.write(DataWriter.java:61) at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:103) at org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:34) at org.apache.iceberg.spark.source.SparkWrite$PartitionedDataWriter.write(SparkWrite.java:629) at org.apache.iceberg.spark.source.SparkWrite$PartitionedDataWriter.write(SparkWrite.java:604) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:416) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$$Lambda$1166/1819967781.apply(Unknown Source) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1504) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:452) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:360) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec$$Lambda$716/86102097.apply(Unknown Source) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)