Closed www2388258980 closed 1 year ago
https://github.com/aws/aws-sdk-java/issues/1405 Too much parallelism may cause this problem
aws/aws-sdk-java#1405 Too much parallelism may cause this problem
job平行度是1,一个taskmanager里面有3个job在跑。
We may can try fs.s3a.connection.maximum=1000
小文件过多会导致s3连接池不够用,可以通过fs.s3a.connection.maximum提高连接池数量。
参考文档:
【1】https://paimon.apache.org/docs/master/maintenance/expiring-snapshots/
【2】https://www.infoq.cn/article/dytkx8luglcu9a81f58q
【3】https://docs.aws.amazon.com/zh_cn/sdk-for-java/latest/developer-guide/best-practices.html
【4】https://zhuanlan.zhihu.com/p/559718865
https://github.com/apache/incubator-paimon/pull/1037 also fixed this
Search before asking
Paimon version
paimon0.4
Compute Engine
flink1.16
Minimal reproduce step
描述
/bin/yarn-session.sh --detached \
-Dtaskmanager.memory.process.size=5000m \
-Dtaskmanager.memory.managed.size=0m \
-Dtaskmanager.memory.network.min=80m \
-Dtaskmanager.memory.network.max=80m \
-Dtaskmanager.numberOfTaskSlots=4
flink on yarn
文件系统使用s3,使用paimon构建实时数仓分层,比如会查几张ods的paimon表,写入到一张'merge-engine' = 'partial-update'的大宽表.
运行一段时间,半小时或者1小时以上。
其他任务(flink cdc)插入到s3://xxxxxxx/hadoop/warehouse/ods_medatc_fts.db/src_public_comments/schema/schema-0任务正常。
ava.io.UncheckedIOException: java.io.InterruptedIOException: getFileStatus on s3://xxxxxxx/hadoop/warehouse/ods_medatc_fts.db/src_public_comments/schema/schema-0: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at org.apache.paimon.schema.SchemaManager.schema(SchemaManager.java:460) at org.apache.paimon.operation.KeyValueFileStoreRead.(KeyValueFileStoreRead.java:88)
at org.apache.paimon.KeyValueFileStore.newRead(KeyValueFileStore.java:84)
at org.apache.paimon.table.ChangelogWithKeyFileStoreTable.newRead(ChangelogWithKeyFileStoreTable.java:193)
at org.apache.paimon.table.source.ReadBuilderImpl.newRead(ReadBuilderImpl.java:81)
at org.apache.paimon.flink.source.FlinkSource.createReader(FlinkSource.java:50)
at org.apache.flink.streaming.api.operators.SourceOperator.initReader(SourceOperator.java:286)
at org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask.init(SourceOperatorStreamTask.java:94)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:692)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:669)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.InterruptedIOException: getFileStatus on s3://xxxxxxx/hadoop/warehouse/ods_medatc_fts.db/src_public_comments/schema/schema-0: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:395)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:175)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3799)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
at org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
at org.apache.paimon.s3.HadoopCompliantFileIO.newInputStream(HadoopCompliantFileIO.java:47)
at org.apache.paimon.fs.PluginFileIO.lambda$newInputStream$0(PluginFileIO.java:47)
at org.apache.paimon.fs.PluginFileIO.wrap(PluginFileIO.java:104)
at org.apache.paimon.fs.PluginFileIO.newInputStream(PluginFileIO.java:47)
at org.apache.paimon.fs.FileIO.readFileUtf8(FileIO.java:173)
at org.apache.paimon.schema.SchemaManager.schema(SchemaManager.java:458)
... 14 more
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1165)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1372)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$10(S3AFileSystem.java:2545)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2533)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2513)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3776)
... 25 more
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:316)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:282)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.amazonaws.http.conn.$Proxy46.get(Unknown Source)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
What doesn't meet your expectations?
repair
Anything else?
No response
Are you willing to submit a PR?