apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.45k stars 2.23k forks source link

The snapshots_id is not found in the table.snapshots #9140

Open lpy148145 opened 11 months ago

lpy148145 commented 11 months ago

Apache Iceberg version

1.2.1

Query engine

Flink

Please describe the bug 🐞

The snapshots_id recorded in """*.metadata.json""" cannot be found in the table after flink writes the iceberg task oom, resulting in a null pointer for the downstream task query

ζˆͺ屏2023-11-23 δΈ‹εˆ10 15 54 ζˆͺ屏2023-11-23 δΈ‹εˆ10 17 06
lpy148145 commented 11 months ago

The snapshot with ID 8030770763716459131 exists in the metadata.json file but cannot be found in the table.

nastra commented 11 months ago

@lpy148145 can you please provide the full stack trace and any additional details that can be helpful?

lpy148145 commented 11 months ago

2023-11-23 20:39:52,422 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Checkpoint 128740 completed. Attempting commit. 2023-11-23 20:39:52,652 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 5295417100067057898 (MergeAppend) 2023-11-23 20:40:00,650 WARN org.apache.iceberg.SnapshotProducer [] - Failed to notify event listeners java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3181) ~[?:1.8.0_292] at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.ensureCapacity(ImmutableMap.java:435) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.put(ImmutableMap.java:448) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.SnapshotParser.fromJson(SnapshotParser.java:139) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:458) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:303) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:273) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:98) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitOperation(IcebergFilesCommitter.java:407) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:104) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointComplete(RegularOperatorChain.java:145) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:479) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.iceberg.MergingSnapshotProducer.updateEvent(MergingSnapshotProducer.java:1024) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitOperation(IcebergFilesCommitter.java:407) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:104) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointComplete(RegularOperatorChain.java:145) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:479) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$15(StreamTask.java:1353) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$1163/2108408591.run(Unknown Source) ~[?:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$18(StreamTask.java:1392) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$1164/1206801299.run(Unknown Source) ~[?:?] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:398) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:367) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:352) ~[flink-dist-1.16.1.jar:1.16.1] 2023-11-23 20:40:52,395 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Checkpoint 128741 completed. Attempting commit. 2023-11-23 20:40:52,604 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 5580965293658940911 (MergeAppend) 2023-11-23 20:40:59,717 WARN org.apache.iceberg.SnapshotProducer [] - Failed to notify event listeners java.lang.OutOfMemoryError: GC overhead limit exceeded 2023-11-23 20:40:59,947 INFO org.apache.flink.connector.base.source.reader.SourceReaderBase [] - Closing Source Reader. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1750) ~[?:1.8.0_292] at java.util.HashMap.putVal(HashMap.java:631) ~[?:1.8.0_292] at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_292] at java.util.HashSet.add(HashSet.java:220) ~[?:1.8.0_292] 23/11/23 22:38:56 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

β–½ 2023-11-23 20:39:52,422 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Checkpoint 128740 completed. Attempting commit. 2023-11-23 20:39:52,652 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 5295417100067057898 (MergeAppend) 2023-11-23 20:40:00,650 WARN org.apache.iceberg.SnapshotProducer [] - Failed to notify event listeners java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3181) ~[?:1.8.0_292] at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.ensureCapacity(ImmutableMap.java:435) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap$Builder.put(ImmutableMap.java:448) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.SnapshotParser.fromJson(SnapshotParser.java:139) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:458) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:303) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:273) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:266) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:98) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitOperation(IcebergFilesCommitter.java:407) ~[ma_jingni-flink-1.2.3.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:104) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.notifyCheckpointComplete(RegularOperatorChain.java:145) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:479) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$15(StreamTask.java:1353) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$1163/2108408591.run(Unknown Source) ~[?:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$18(StreamTask.java:1392) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$1164/1206801299.run(Unknown Source) ~[?:?] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:398) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:367) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:352) ~[flink-dist-1.16.1.jar:1.16.1] 2023-11-23 20:40:52,395 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Checkpoint 128741 completed. Attempting commit. 2023-11-23 20:40:52,604 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 5580965293658940911 (MergeAppend) 2023-11-23 20:40:59,717 WARN org.apache.iceberg.SnapshotProducer [] - Failed to notify event listeners java.lang.OutOfMemoryError: GC overhead limit exceeded 2023-11-23 20:40:59,947 INFO org.apache.flink.connector.base.source.reader.SourceReaderBase [] - Closing Source Reader. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newNode(HashMap.java:1750) ~[?:1.8.0_292] at java.util.HashMap.putVal(HashMap.java:631) ~[?:1.8.0_292] at java.util.HashMap.put(HashMap.java:612) ~[?:1.8.0_292] at java.util.HashSet.add(HashSet.java:220) ~[?:1.8.0_292] at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) ~[?:1.8.0_292] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [flink-rpc-akka_4662f6b1-1d2c-4c32-866e-7872b4432f8e.jar:1.16.1] 2023-11-23 20:40:59,948 INFO org.apache.kafka.common.metrics.Metrics [] - Metrics scheduler closed at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:367) ~[flink-dist-1.16.1.jar:1.16.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:352) ~[flink-dist-1.16.1.jar:1.16.1] 2023-11-23 20:40:00,880 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed append to table: ma_catalog.mz_analytics.ods_ma_base, branch: main, checkpointId 128740 in 8448 ms 2023-11-23 20:40:52,244 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Start to flush snapshot state to state backend, table: ma_catalog.mz_analytics.ods_ma_base, checkpointId: 128741 2023-11-23 20:40:52,395 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Checkpoint 128741 completed. Attempting commit. 2023-11-23 20:40:52,421 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committing append for checkpoint 128741 to table ma_catalog.mz_analytics.ods_ma_base branch main with summary: CommitSummary{dataFilesCount=1, dataFilesRecordCount=4, dataFilesByteCount=3984, deleteFilesCount=0, deleteFilesRecordCount=0, deleteFilesByteCount=0} 2023-11-23 20:40:52,576 INFO org.apache.iceberg.hadoop.HadoopTableOperations [] - Committed a new metadata file hdfs://nbns2/omi/mz_analytics/data/warehouse/mz_analytics/ods_ma_base/metadata/v431215.metadata.json 2023-11-23 20:40:52,604 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 5580965293658940911 (MergeAppend) 2023-11-23 20:40:59,717 WARN org.apache.iceberg.SnapshotProducer [] - Failed to notify event listeners java.lang.OutOfMemoryError: GC overhead limit exceeded 2023-11-23 20:40:59,947 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed append to table: ma_catalog.mz_analytics.ods_ma_base, branch: main, checkpointId 128741 in 7525 ms 2023-11-23 20:40:59,947 INFO org.apache.flink.connector.base.source.reader.SourceReaderBase [] - Closing Source Reader. 2023-11-23 20:40:59,948 INFO org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher [] - Shutting down split fetcher 0 2023-11-23 20:40:59,946 WARN org.jboss.netty.channel.socket.nio.AbstractNioSelector [] - Unexpected exception in the selector loop. java.lang.OutOfMemoryError: GC overhead limit exceeded

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.