Open zhztheplayer opened 3 weeks ago
This is only a draft plan. Should indeed revisit before starting to work on it.
that's great!
I suggest spark.gluten.sql.columnar.backend.velox.ioThreads
could be changed to spark.gluten.velox.ioThreads
, .backend.
can be removed.
.backend. can be removed.
Yes that's one possible solution. If we drop .backend
, then for the option keys that are common to all backends, say spark.gluten.sql.columnar.broadcastJoin
, I think we could insert a common.
or something to distinguish the namespace with backend configs, e.g., rename it to spark.gluten.common.broadcastJoin
.
Basically we have two ideas:
backend.
for backend configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed to spark.gluten.backend.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed to spark.gluten.broadcastJoin
common.
for common configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed to spark.gluten.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed to spark.gluten.common.broadcastJoin
.backend. can be removed.
Yes that's one possible solution. If we drop
.backend
, then for the option keys that are common to all backends, sayspark.gluten.sql.columnar.broadcastJoin
, I think we could insert acommon.
or something to distinguish the namespace with backend configs, e.g., rename it tospark.gluten.common.broadcastJoin
.Basically we have two ideas:
Idea 1,
backend.
for backend configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed tospark.gluten.backend.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed tospark.gluten.broadcastJoin
Idea 2,
common.
for common configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed tospark.gluten.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed tospark.gluten.common.broadcastJoin
We could initial a vote on this.
.backend. can be removed.
Yes that's one possible solution. If we drop
.backend
, then for the option keys that are common to all backends, sayspark.gluten.sql.columnar.broadcastJoin
, I think we could insert acommon.
or something to distinguish the namespace with backend configs, e.g., rename it tospark.gluten.common.broadcastJoin
. Basically we have two ideas:
Idea 1,
backend.
for backend configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed tospark.gluten.backend.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed tospark.gluten.broadcastJoin
Idea 2,
common.
for common configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed tospark.gluten.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed tospark.gluten.common.broadcastJoin
We could initial a vote on this.
To me 2 looks better so far, since it categorizes the configs to fixed length of prefixes, which is 3, e.g., s.g.common
, and s.g.velox
. and it's closer to your suggestion either. I think we can put the direction on 2, once someone questioned, then we can decide whether to vote.
.backend. can be removed.
Yes that's one possible solution. If we drop
.backend
, then for the option keys that are common to all backends, sayspark.gluten.sql.columnar.broadcastJoin
, I think we could insert acommon.
or something to distinguish the namespace with backend configs, e.g., rename it tospark.gluten.common.broadcastJoin
. Basically we have two ideas:
Idea 1,
backend.
for backend configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed tospark.gluten.backend.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed tospark.gluten.broadcastJoin
Idea 2,
common.
for common configs
spark.gluten.sql.columnar.backend.velox.ioThread
renamed tospark.gluten.velox.ioThread
spark.gluten.sql.columnar.broadcastJoin
renamed tospark.gluten.common.broadcastJoin
We could initial a vote on this.
To me 2 looks better so far, since it categorizes the configs to fixed length of prefixes, which is 3, e.g.,
s.g.common
, ands.g.velox
. and it's closer to your suggestion either. I think we can put the direction on 2, once someone questioned, then we can decide whether to vote.
I'm ok with s.g.velox
and s.g.common
.
In addition, I suggest adding GlutenStaticConfig and moving static conf here.
Description
Gluten's configuration utilities, especially
GlutenConfig.scala
, is getting messed up over time. To keep code clean and easy to maintain, will propose the below changes:Simplify and unify option key formats
spark.gluten.sql.columnar
tpspark.gluten
.spark.gluten.sql.columnar.backend.velox.IOThreads
tospark.gluten.sql.columnar.backend.velox.ioThreads
Move specific configurations to backend module
For better module-wise isolation, we should move configuration definitions specifically for backends to their own module. E.g.,
spark.gluten.sql.columnar.backend.velox.window.type
should be defined inbackends-velox
thangluten-core
.(Velox) Remove the list of configurations passing to native
In Velox backend, we currently have a list to decide whether a config option should be passed to native. Rather than using this way, we can just set a flag when define the config. Or just using Spark's built-in static / non-static config flag to distinguish.
Backward compatibility
Since configuration keys may be modified during doing this, we should maintain a list of configuration keys for backward compatibility. E.g.,
spark.gluten.sql.columnar.backend.velox.IOThreads
can be the alternative ofspark.gluten.backend.velox.ioThreads
. But we should raise a warning when the former is set by user as it's considered deprecated. After several development iterations, support of the old one will be removed completely.