apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.05k stars 1.82k forks source link

[Bug] [Module Name] 同步文件物理内存耗尽 #7866

Open linruzhou opened 1 month ago

linruzhou commented 1 month ago

Search before asking

What happened

使用seatunnel同步大量小文件,比如10000个jpg文件到minio时,开始运行正常,到最后会把物理内存耗尽

SeaTunnel Version

2.3.2

SeaTunnel Config

seatunnel:
  engine:
    backup-count: 1
    task_execution_thread_share_mode: OFF
    queue-type: blockingqueue
    print-execution-info-interval: 60
    print-job-metrics-info-interval: 60
    history-job-expire-minutes: 10
    classloader-cache-mode: true
    slot-service:
      dynamic-slot: true
    checkpoint:
      interval: 100000
      timeout: 60000
      max-concurrent: 5
      tolerable-failure: 2
      storage:
        type: hdfs
        max-retained: 3
        plugin-config:
          namespace: /tmp/seatunnel/checkpoint_snapshot
          storage.type: hdfs
          fs.defaultFS: file:///tmp/ # Ensure that the directory has written permission

Running Command

./seatunnel-cluster.sh

Error Exception

物理内存耗尽,物理机内存32G
jvm配置信息:
# JVM Heap
-Xms12g
-Xmx12g

# JVM Dump
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/seatunnel/dump/zeta-server
-XX:+UseG1GC
-XX:MaxGCPauseMillis=5000
-XX:MaxMetaspaceSize=4g
-XX:G1HeapRegionSize=32M
-XX:GCTimeRatio=4
-XX:G1ReservePercent=15
-XX:ConcGCThreads=8
-XX:+UseStringDeduplication
-XX:InitiatingHeapOccupancyPercent=50

Zeta or Flink or Spark Version

Zeta

Java or Scala Version

jdk1.8

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 3 days ago

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.