apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.32k stars 2.41k forks source link

[SUPPORT] Flink Async Compaction MOR Table,OutOfMemoryError: Requested array size exceeds VM limit #8902

Open BohanZhang0222 opened 1 year ago

BohanZhang0222 commented 1 year ago

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. Flink Sql write MOR Table
  2. Flink Async Compaction MOR Table
  3. RemoteHoodieTableFileSystemView [] - Sending request : (http://IP:11195/v1/hoodie/view/filegroups/all/partition Server Error. 4.Service Log: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

tm log:

2023-06-08 08:57:52,595 ERROR org.apache.hudi.common.table.view.PriorityBasedFileSystemView [] - Got error running preferred function. Trying secondary
org.apache.hudi.exception.HoodieRemoteException: Server Error
        at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getAllFileGroups(RemoteHoodieTableFileSystemView.java:403) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:84) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.getAllFileGroups(PriorityBasedFileSystemView.java:211) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:315) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.table.action.clean.CleanPlanner.getFilesToCleanKeepingLatestCommits(CleanPlanner.java:278) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.table.action.clean.CleanPlanner.getDeletePaths(CleanPlanner.java:448) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.lambda$requestClean$e73901b4$1(CleanPlanActionExecutor.java:121) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) [?:1.8.0_162]
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) [?:1.8.0_162]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) [?:1.8.0_162]
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) [?:1.8.0_162]
        at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) [?:1.8.0_162]
        at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) [?:1.8.0_162]
        at java.util.stream.AbstractTask.compute(AbstractTask.java:316) [?:1.8.0_162]
        at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) [?:1.8.0_162]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_162]
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_162]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_162]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [?:1.8.0_162]
Caused by: org.apache.hudi.org.apache.http.client.HttpResponseException: Server Error
        at org.apache.hudi.org.apache.http.impl.client.AbstractResponseHandler.handleResponse(AbstractResponseHandler.java:69) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.http.client.fluent.Response.handleResponse(Response.java:90) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.http.client.fluent.Response.returnContent(Response.java:97) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:186) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.getAllFileGroups(RemoteHoodieTableFileSystemView.java:399) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        ... 19 more
2023-06-08 08:57:52,601 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView [] - Building file system view for partition (dt=2023-06-07)

compaction service log

2023-06-08 11:37:30,476 INFO  org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor [] - Job has been submitted with JobID 5c7f494ef6cda391a317823c13c0577e
~
                                                                                                                                                                         2155,1        Bot
2023-06-08 11:33:46,003 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView [] - addFilesToView: NumFiles=3613, NumFileGroups=1235, FileGroupsCreationTime=340, StoreTimeTaken=0
2023-06-08 11:33:47,326 INFO  org.apache.hudi.common.table.view.AbstractTableFileSystemView [] - addFilesToView: NumFiles=21175, NumFileGroups=6410, FileGroupsCreationTime=1407, StoreTimeTaken=4
2023-06-08 11:33:56,733 ERROR io.javalin.Javalin                                           [] - Exception occurred while servicing http-request
java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) ~[?:1.8.0_162]
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:991) ~[?:1.8.0_162]
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124) ~[?:1.8.0_162]
        at io.javalin.http.JavalinServletHandler.queueNextTaskOrFinish$javalin(JavalinServletHandler.kt:85) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at io.javalin.http.JavalinServlet.service(JavalinServlet.kt:89) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at io.javalin.jetty.JavalinJettyServlet.service(JavalinJettyServlet.kt:58) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at io.javalin.jetty.JettyServer$start$wsAndHttpHandler$1.doHandle(JettyServer.kt:52) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.Server.handle(Server.java:516) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.HttpChannel.handle(HttpChannel.java:479) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.io.FillInterest.fillable(FillInterest.java:105) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [hudi-flink1.14-bundle-0.13.0.jar:0.13.0]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
2023-06-08 11:35:15,278 INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Loaded instants upto : Option{val=[==>20230608113405882__deltacommit__INFLIGHT]}
danny0405 commented 1 year ago

There is an option for the sort memory, did you try that, did you try to turn the memory for flink JM?

BohanZhang0222 commented 1 year ago

There is an option for the sort memory, did you try that, did you try to turn the memory for flink JM?

  1. Can you provide specific option? Thanks.
  2. My start command: ./bin/flink run -ynm compaction_ods_ssp_cloud_vss_on_changed_topic_hudi -yqu eps -ytm 8192 -yd -m yarn-cluster -yt /chj/flink/flink-1.14.0/sql_jar -C file:///chj/flink/flink-1.14.0/sql_jar/juicefs-hadoop-1.0.0-lixiang-dip.jar -c org.apache.hudi.sink.compact.HoodieFlinkCompactor /chj/flink/flink-1.14.0/sql_jar/hudi-flink1.14-bundle-0.13.0.jar --path jfs://XXXX --compaction-max-memory 1024 --seq LIFO --compaction-tasks 10 --min-compaction-interval-seconds 120 --service FileSystemViewManager Server Run on the host machine instead of JM.
danny0405 commented 1 year ago

It seems the filesystem view takes too much memory. It starts on the client machine that you submit the job.

BohanZhang0222 commented 1 year ago

It seems the filesystem view takes too much memory. It starts on the client machine that you submit the job.

how to adjust client memory ?

danny0405 commented 1 year ago

Based on how you submit the job, you can config more memory for the start up JVM process.