[Improvement] off-heap memory usage of the driver is too high

ic4y commented 1 month ago

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

What would you like to be improved?

测试使用Kyuubi 1.8 + spark3.3.2

以下测试都是在engine启动并未执行任何SQL的情况下

1、通过kyuubi 提交到spark engine 到driver内存使用情况如下： 2、通过spark sql 提交的driver内存使用情况如下：

我们通过对比可以得到kyuubi spark engine driver 的jvm 内存使用在 Class 、Thread、GC、Internal 这几个方面差异较大。kyuubi 提交的driver 堆外内存使用在1G左右。

1、虽然kyuubi 会比spark sql多额外的一些类和线程，但这还是值得注意的点(可以排查看是否可优化)。 2、这会导致spark.driver.memoryOverhead为默认值的情况下，Driver容器被异常kill(yarn 判断实际使用内存超限制)。

下面我举个例子说明 spark.driver.memoryOverhead 默认值为spark.driver.memory*0.1 和 384mb对比取最大的值。假设现在spark.driver.memory 设置为2048M，那么memoryOverhead为400M。这时候yarn会在实际使用内存>2448M的时候kill drievr容器，但我们堆外内存已经使用了1024M的情况下我们实际可用内存还有1424Mb,并且我们的堆内存最大是2048M所以在正常使用的情况下内存可能内存达到1424Mb(可GC到200M左右)但还未进行full GC,然后程序就被kill。

这会导致一种情况就是我程序设置spark.driver.memory*0.1=6G 程序不能正常运行（被kill）。但我设置spark.driver.memory*0.1=2G，spark.driver.memoryOverhead=1G程序却能成功运行。这就是因为memoryOverhead和实际堆外内存使用不匹配导致堆内内存使用不到spark.driver.memory的最大值就被kill,但又往往堆内存使用接近最大值才会full GC 回收掉一些内存。

所以强烈建议在大家确定的这个问题后，在文档显眼位置或者kyuubi 默认值中把 spark.driver.memoryOverhead 设置为1G。在Executor中是否存在类似问题还未测试。

这能解决很多sql 执行稳定性的问题。我在我们集群添加了这个参数，减少了很多失败SQL。

Testing with Kyuubi 1.8 + Spark 3.3.2

The following tests were conducted with the engine started but without executing any SQL.

1、The memory usage of the driver when submitting to the Spark engine via Kyuubi is as follows: 2、The memory usage of the driver when submitting through Spark SQL is as follows:

From the comparison, we can see that the JVM memory usage of the Kyuubi Spark engine driver differs significantly in the areas of Class, Thread, GC, and Internal. The off-heap memory usage of the driver submitted via Kyuubi is around 1GB.

1、Although Kyuubi does introduce some additional classes and threads compared to Spark SQL, this is still a noteworthy point (it can be examined for possible optimization). 2、This can lead to the driver container being killed abnormally when the spark.driver.memoryOverhead is set to the default value (YARN determines that the actual memory usage exceeds the limit).

Let me give an example to illustrate: The default value of spark.driver.memoryOverhead is the greater of spark.driver.memory * 0.1 and 384MB. Suppose spark.driver.memory is set to 2048M, then the memoryOverhead is 400M. At this time, YARN will kill the driver container when the actual memory usage exceeds 2448M. However, when our off-heap memory usage is already 1024M, our actual available memory is only 1424MB. The maximum heap memory is 2048M, so under normal circumstances, the memory usage might reach 1424MB (can be GC'd down to around 200MB) but without triggering a full GC, the program gets killed.

This leads to a situation where: If I set spark.driver.memory * 0.1 = 6G, the program cannot run normally (gets killed). But if I set spark.driver.memory * 0.1 = 2G and spark.driver.memoryOverhead to 1G, the program runs successfully. This is because the mismatch between memoryOverhead and the actual off-heap memory usage causes the heap memory to be used up to near the maximum before being killed, while typically a full GC would only occur when heap memory usage approaches the maximum.

Therefore, it is strongly recommended that after confirming this issue, the spark.driver.memoryOverhead should be set to 1G in a prominent position in the documentation or as a default value in Kyuubi. Whether similar issues exist for the executor has not yet been tested.

This adjustment can solve many SQL execution stability issues. Adding this parameter to our cluster has reduced many failed SQL executions.

How should we improve?

No response

Are you willing to submit PR?

[ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
[ ] No. I cannot submit a PR at this time.

pan3793 commented 1 month ago

An interesting diagnosis.

TBH, the direct memory usage of JVM is always a mystery for me, and I agree with you that we should find a way to analyze the direct memory used by the Spark driver launched by Kyuubi.

... the engine started but without executing any SQL.

Have you unset kyuubi.engine.initialize.sql=SHOW DATABASES? If not, it will init a HiveClient and create bunches of Hive objects.

I also see that the default value of spark.[driver|executor].memoryOverheadFactor 0.1 is too small for most production Spark jobs.

Internally, to tackle the YARN OOM kill issues, we have done:

SPARK-47208 introduces spark.[driver|executor].minMemoryOverhead to make the hardcoded 384m configurable, in practice, we set it to 2g to gain the stability.
Disable direct memory usage of Netty by setting spark.network.io.preferDirectBufs=false, spark.shuffle.io.preferDirectBufs=false
Lower the concurrency of shuffle block fetching network requests by setting spark.reducer.maxReqsInFlight=256

pan3793 commented 1 month ago

Given increasing the default driver memoryOverhead does increase the stability of Spark jobs, in addition to documenting the suggestion, what do you think about doing setIfMissing spark.driver.minMemoryOverhead=1g in SparkProcessBuilder? Then users who run Spark 4.0+ would get benefits out-of-box.

ic4y commented 1 month ago

Have you unset kyuubi.engine.initialize.sql=SHOW DATABASES? If not, it will init a HiveClient and create bunches of Hive objects.

I did overlook this point and executed kyuubi.engine.initialize.sql=select 1 before the statistics. The screenshot below shows the state after I unset kyuubi.engine.initialize.sql, meaning no SQL was executed.

The overall difference from my previous tests is not significant.

ic4y commented 1 month ago

I also see that the default value of spark.[driver|executor].memoryOverheadFactor 0.1 is too small for most production Spark jobs.

To tackle the YARN OOM kill issues, we have done:

SPARK-47208 introduces spark.[driver|executor].minMemoryOverhead to make the hardcoded 384m configurable, in practice, we set it to 2g to gain the stability.

Disable direct memory usage of Netty by setting spark.network.io.preferDirectBufs=false, spark.shuffle.io.preferDirectBufs=false

Lower the concurrency of shuffle block fetching network requests by setting spark.reducer.maxReqsInFlight=256

I completely agree with your point of view. Thank you for providing these parameter setting suggestions; they are very helpful to me. I will try these settings in the future.

ic4y commented 1 month ago

Given increasing the default driver memoryOverhead does increase the stability of Spark jobs, in addition to documenting the suggestion, what do you think about doing setIfMissing spark.driver.minMemoryOverhead=1g in SparkProcessBuilder? Then users who run Spark 4.0+ would get benefits out-of-box.

I think using setIfMissing spark.driver.minMemoryOverhead=1g in SparkProcessBuilder is reasonable since the basic off-heap overhead has already reached 1G. However, I am not very familiar with Kyuubi and Spark, so why is this change specific to Spark 4.0+?

pan3793 commented 1 month ago

SPARK-47208 is added in Spark 4.0, so lower OSS Spark versions don't know spark.driver.minMemoryOverhead

ic4y commented 1 month ago

SPARK-47208 is added in Spark 4.0, so lower OSS Spark versions don't know spark.driver.minMemoryOverhead

Got it. We can set both spark.driver.minMemoryOverhead=1g and spark.driver.memoryOverhead=1g to ensure compatibility with more versions.

pan3793 commented 1 month ago

setIfMissing spark.driver.memoryOverhead=1g is not suitable, because the user may set a larger spark driver memory

apache / kyuubi