apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.14k stars 416 forks source link

[CORE] Configuration enhancements #6970

Open zhztheplayer opened 3 weeks ago

zhztheplayer commented 3 weeks ago

Description

Gluten's configuration utilities, especially GlutenConfig.scala, is getting messed up over time. To keep code clean and easy to maintain, will propose the below changes:

Simplify and unify option key formats

  1. Could simplify config key prefix from spark.gluten.sql.columnar tp spark.gluten.
  2. Use camel case uniformly, e.g, spark.gluten.sql.columnar.backend.velox.IOThreads to spark.gluten.sql.columnar.backend.velox.ioThreads

Move specific configurations to backend module

For better module-wise isolation, we should move configuration definitions specifically for backends to their own module. E.g., spark.gluten.sql.columnar.backend.velox.window.type should be defined in backends-velox than gluten-core.

(Velox) Remove the list of configurations passing to native

In Velox backend, we currently have a list to decide whether a config option should be passed to native. Rather than using this way, we can just set a flag when define the config. Or just using Spark's built-in static / non-static config flag to distinguish.

Backward compatibility

Since configuration keys may be modified during doing this, we should maintain a list of configuration keys for backward compatibility. E.g., spark.gluten.sql.columnar.backend.velox.IOThreads can be the alternative of spark.gluten.backend.velox.ioThreads. But we should raise a warning when the former is set by user as it's considered deprecated. After several development iterations, support of the old one will be removed completely.

zhztheplayer commented 3 weeks ago

This is only a draft plan. Should indeed revisit before starting to work on it.

Yohahaha commented 3 weeks ago

that's great! I suggest spark.gluten.sql.columnar.backend.velox.ioThreads could be changed to spark.gluten.velox.ioThreads, .backend. can be removed.

zhztheplayer commented 3 weeks ago

.backend. can be removed.

Yes that's one possible solution. If we drop .backend, then for the option keys that are common to all backends, say spark.gluten.sql.columnar.broadcastJoin, I think we could insert a common. or something to distinguish the namespace with backend configs, e.g., rename it to spark.gluten.common.broadcastJoin.

Basically we have two ideas:

Yohahaha commented 3 weeks ago

.backend. can be removed.

Yes that's one possible solution. If we drop .backend, then for the option keys that are common to all backends, say spark.gluten.sql.columnar.broadcastJoin, I think we could insert a common. or something to distinguish the namespace with backend configs, e.g., rename it to spark.gluten.common.broadcastJoin.

Basically we have two ideas:

  • Idea 1, backend. for backend configs

    • spark.gluten.sql.columnar.backend.velox.ioThread renamed to spark.gluten.backend.velox.ioThread
    • spark.gluten.sql.columnar.broadcastJoin renamed to spark.gluten.broadcastJoin
  • Idea 2, common. for common configs

    • spark.gluten.sql.columnar.backend.velox.ioThread renamed to spark.gluten.velox.ioThread
    • spark.gluten.sql.columnar.broadcastJoin renamed to spark.gluten.common.broadcastJoin

We could initial a vote on this.

zhztheplayer commented 3 weeks ago

.backend. can be removed.

Yes that's one possible solution. If we drop .backend, then for the option keys that are common to all backends, say spark.gluten.sql.columnar.broadcastJoin, I think we could insert a common. or something to distinguish the namespace with backend configs, e.g., rename it to spark.gluten.common.broadcastJoin. Basically we have two ideas:

  • Idea 1, backend. for backend configs

    • spark.gluten.sql.columnar.backend.velox.ioThread renamed to spark.gluten.backend.velox.ioThread
    • spark.gluten.sql.columnar.broadcastJoin renamed to spark.gluten.broadcastJoin
  • Idea 2, common. for common configs

    • spark.gluten.sql.columnar.backend.velox.ioThread renamed to spark.gluten.velox.ioThread
    • spark.gluten.sql.columnar.broadcastJoin renamed to spark.gluten.common.broadcastJoin

We could initial a vote on this.

To me 2 looks better so far, since it categorizes the configs to fixed length of prefixes, which is 3, e.g., s.g.common, and s.g.velox. and it's closer to your suggestion either. I think we can put the direction on 2, once someone questioned, then we can decide whether to vote.

Yohahaha commented 3 weeks ago

.backend. can be removed.

Yes that's one possible solution. If we drop .backend, then for the option keys that are common to all backends, say spark.gluten.sql.columnar.broadcastJoin, I think we could insert a common. or something to distinguish the namespace with backend configs, e.g., rename it to spark.gluten.common.broadcastJoin. Basically we have two ideas:

  • Idea 1, backend. for backend configs

    • spark.gluten.sql.columnar.backend.velox.ioThread renamed to spark.gluten.backend.velox.ioThread
    • spark.gluten.sql.columnar.broadcastJoin renamed to spark.gluten.broadcastJoin
  • Idea 2, common. for common configs

    • spark.gluten.sql.columnar.backend.velox.ioThread renamed to spark.gluten.velox.ioThread
    • spark.gluten.sql.columnar.broadcastJoin renamed to spark.gluten.common.broadcastJoin

We could initial a vote on this.

To me 2 looks better so far, since it categorizes the configs to fixed length of prefixes, which is 3, e.g., s.g.common, and s.g.velox. and it's closer to your suggestion either. I think we can put the direction on 2, once someone questioned, then we can decide whether to vote.

I'm ok with s.g.velox and s.g.common.

zml1206 commented 3 weeks ago

In addition, I suggest adding GlutenStaticConfig and moving static conf here.