apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.06k stars 903 forks source link

[Bug] Modifying kyuubi-defaults.conf does not take effect when starting Spark for the second time #6536

Open A-little-bit-of-data opened 1 month ago

A-little-bit-of-data commented 1 month ago

Code of Conduct

Search before asking

Describe the bug

When kyuubi-defaults.conf is modified and spark is started again, the new configuration is not loaded

When kyuubi-defaults.conf is configured as follows,The spark task started on k8s still reads the old configuration.

trino.spark.executor.instances=2 trino.spark.driver.cores=1 trino.spark.executor.cores=2 trino.spark.kubernetes.driver.limit.cores=2 trino.spark.kubernetes.executor.limit.cores=3 trino.spark.driver.memory=1g trino.spark.executor.memory=2g

trino.spark.executor.instances=2 trino.spark.driver.cores=1 trino.spark.executor.cores=2 trino.spark.kubernetes.driver.limit.cores=2 trino.spark.kubernetes.executor.limit.cores=4 trino.spark.driver.memory=1g trino.spark.executor.memory=4g

The spark.kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=ALL parameter is configured, but after completion, the spark driver pod still exists and the status is completed. It needs to be manually deleted.

Affects Version(s)

1.9.1

Kyuubi Server Log Output

/opt/kyuubi/externals/spark-3.5.1-bin-hadoop3/bin/spark-submit \
    --class org.apache.kyuubi.engine.spark.SparkSQLEngine \
    --conf spark.hive.server2.thrift.resultset.default.fetch.size=1000 \
    --conf spark.kyuubi.client.version=1.9.1 \
    --conf spark.kyuubi.engine.engineLog.path=/opt/kyuubi/work/trino/kyuubi-spark-sql-engine.log.2 \
    --conf spark.kyuubi.engine.share.level=USER \
    --conf spark.kyuubi.engine.submit.time=1721112987920 \
    --conf spark.kyuubi.engine.type=SPARK_SQL \
    --conf spark.kyuubi.frontend.connection.url.use.hostname=false \
    --conf spark.kyuubi.frontend.protocols=REST,THRIFT_BINARY \
    --conf spark.kyuubi.ha.engine.ref.id=9653cc02-df96-4691-82b9-3452c5c2cdc4 \
    --conf spark.kyuubi.ha.namespace=/kyuubi_1.9.1_USER_SPARK_SQL/trino/default \
    --conf spark.kyuubi.ha.zookeeper.session.timeout=600000 \
    --conf spark.kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=ALL \
    --conf spark.kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT1M \
    --conf spark.kyuubi.metrics.enabled=true \
    --conf spark.kyuubi.metrics.reporters= \
    --conf spark.kyuubi.session.engine.check.interval=PT1M \
    --conf spark.kyuubi.session.engine.idle.timeout=PT2M \
    --conf spark.kyuubi.session.real.user=trino \
    --conf spark.app.name=trino \
    --conf spark.driver.cores=1 \
    --conf spark.driver.memory=1g \
    --conf spark.executor.cores=2 \
    --conf spark.executor.instances=2 \
    --conf spark.executor.memory=2g \
    --conf spark.kubernetes.driver.label.kyuubi-unique-tag=9653cc02-df96-4691-82b9-3452c5c2cdc4 \
    --conf spark.kubernetes.driver.limit.cores=2 \
    --conf spark.kubernetes.driver.pod.name=kyuubi-trino-9653cc02-df96-4691-82b9-3452c5c2cdc4-driver \
    --conf spark.kubernetes.executor.limit.cores=3 \
    --conf spark.kubernetes.executor.podNamePrefix=kyuubi-trino-9653cc02-df96-4691-82b9-3452c5c2cdc4 \
    --conf spark.user=trino \
    --conf spark.kubernetes.driverEnv.SPARK_USER_NAME=trino \
    --conf spark.executorEnv.SPARK_USER_NAME=trino \
    --proxy-user trino /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.9.1.jar

 /opt/kyuubi/externals/spark-3.5.1-bin-hadoop3/bin/spark-submit \
    --class org.apache.kyuubi.engine.spark.SparkSQLEngine \
    --conf spark.hive.server2.thrift.resultset.default.fetch.size=1000 \
    --conf spark.kyuubi.client.version=1.9.1 \
    --conf spark.kyuubi.engine.engineLog.path=/opt/kyuubi/work/trino/kyuubi-spark-sql-engine.log.3 \
    --conf spark.kyuubi.engine.share.level=USER \
    --conf spark.kyuubi.engine.submit.time=1721114348590 \
    --conf spark.kyuubi.engine.type=SPARK_SQL \
    --conf spark.kyuubi.frontend.connection.url.use.hostname=false \
    --conf spark.kyuubi.frontend.protocols=REST,THRIFT_BINARY \
    --conf spark.kyuubi.ha.engine.ref.id=f93fdc9d-402b-4d9c-ba6e-7f01a4451f19 \
    --conf spark.kyuubi.ha.namespace=/kyuubi_1.9.1_USER_SPARK_SQL/trino/default \
    --conf spark.kyuubi.ha.zookeeper.session.timeout=600000 \
    --conf spark.kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=ALL \
    --conf spark.kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT1M \
    --conf spark.kyuubi.metrics.enabled=true \
    --conf spark.kyuubi.metrics.reporters= \
    --conf spark.kyuubi.session.engine.check.interval=PT1M \
    --conf spark.kyuubi.session.engine.idle.timeout=PT2M \
    --conf spark.kyuubi.session.real.user=trino \
    --conf spark.app.name=trino \
    --conf spark.driver.cores=1 \
    --conf spark.driver.memory=1g \
    --conf spark.executor.cores=2 \
    --conf spark.executor.instances=2 \
    --conf spark.executor.memory=2g \
    --conf spark.kubernetes.driver.label.kyuubi-unique-tag=f93fdc9d-402b-4d9c-ba6e-7f01a4451f19 \
    --conf spark.kubernetes.driver.limit.cores=2 \
    --conf spark.kubernetes.driver.pod.name=kyuubi-trino-f93fdc9d-402b-4d9c-ba6e-7f01a4451f19-driver \
    --conf spark.kubernetes.executor.limit.cores=3 \
    --conf spark.kubernetes.executor.podNamePrefix=kyuubi-trino-f93fdc9d-402b-4d9c-ba6e-7f01a4451f19 \
    --conf spark.user=trino \
    --conf spark.kubernetes.driverEnv.SPARK_USER_NAME=trino \
    --conf spark.executorEnv.SPARK_USER_NAME=trino \
    --proxy-user trino /opt/kyuubi/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.9.1.jar

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

pan3793 commented 1 month ago

why does your configuration start with trino.?

pan3793 commented 1 month ago

BTW, if you touch kyuubi-defaults.conf, you must restart the Kyuubi process. For your case, maybe you can modify spark-defaults.conf then the newly invoked spark-submit will see the updated Spark configurations.

pan3793 commented 1 month ago

The spark.kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=ALL parameter is configured, but after completion, the spark driver pod still exists and the status is completed. It needs to be manually deleted.

it was reported by other users too, but can not be reproducible on my side. it's likely there are some issues on K8s client construction, you may want to enable trace level logs for Kyuubi server process to find what happened on your K8s client construction

A-little-bit-of-data commented 1 month ago

why does your configuration start with trino.?

image It should be trino. Used to configure multiple tenants

A-little-bit-of-data commented 1 month ago

BTW, if you touch kyuubi-defaults.conf, you must restart the Kyuubi process. For your case, maybe you can modify spark-defaults.conf then the newly invoked spark-submit will see the updated Spark configurations.

I have deployed kyuubi on a container and controlled it through configmap. I only modified kyuubi-defaults.conf but not spark-defaults.conf. Do I need to restart kyuubi? In version 1.9.1, both kyuubi-defaults.conf and spark-defaults.conf are in configmap. Do I need to restart kyuubi to load the new configuration after modifying the configuration files?

pan3793 commented 1 month ago

oh, the underscore is part of markdown syntax, you need to quote your inline code by backtick

pan3793 commented 1 month ago

Kyuubi process consumes kyuubi-defaults.conf and spark-submit consumes spark-defaults.conf, it depends on your expectation to update and restart the corresponding configuration files and processes.

A-little-bit-of-data commented 1 month ago

Kyuubi process consumes kyuubi-defaults.conf and spark-submit consumes spark-defaults.conf, it depends on your expectation to update and restart the corresponding configuration files and processes.

Sorry, I don't quite understand what you mean. I understand that the spark on k8s task submitted by spark-submit is restarted after spark-defaults.conf is modified, but the new configuration is still not loaded. Under normal circumstances, I understand that I should have modified the spark-defaults.conf file under configmap kyuubi-spark, and the changes in the spark-defaults.conf file have been read in the kyuubi process. Then the spark on k8s task submitted by spark-submit again should load the spark-defaults.conf file I modified, instead of restarting the kyuubi application to read the modified content of the spark-defaults.conf file.

pan3793 commented 1 month ago

correct.

in short, the modified spark-defaults.conf will take effect in the next spark-submit, so if you want to change spark conf without restarting Kyuubi process, just modify spark-defaults.conf

A-little-bit-of-data commented 1 month ago

correct.

in short, the modified spark-defaults.conf will take effect in the next spark-submit, so if you want to change spark conf without restarting Kyuubi process, just modify spark-defaults.conf

But the current situation is that without restarting kyuubi, the content of spark-defaults.conf before the modification is always read. Only after restarting kyuubi can the modified content of spark-defaults.conf be read. Here, after modifying spark-defaults.conf, I checked the spark-defaults.conf file in the kyuubi pod. The spark-defaults.conf file in the pod has been updated, but using spark-submit still reads the old configuration of the spark-defaults.conf file

pan3793 commented 1 month ago

the content of spark-defaults.conf before the modification is always read.

this is not expected, how do you make this assertion, can you provide some logs/screenshots or other proofs?

A-little-bit-of-data commented 1 month ago

the content of spark-defaults.conf before the modification is always read.

this is not expected, how do you make this assertion, can you provide some logs/screenshots or other proofs?

There is a time in the initial submission log - conf spark.kyuubi.engine.submit.time. That is, after changing the spark-defaults.conf file, I waited for a while and then restarted it. You can see that the startup configuration is still the same.

pan3793 commented 1 month ago

providing original logs.