apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 914 forks source link

[KYUUBI #6402]: engine.share.level=GROUP enable for a list of hadoop … #6779

Open Madhukar525722 opened 4 weeks ago

Madhukar525722 commented 4 weeks ago

…groups

:mag: Description

Issue References 🔗

This pull request fixes #6402

Describe Your Solution 🔧

Currently, the group level engine is getting launched with the first user group itself. This change will allow user to pass in which group they want to launch the engine.

The flow of implementation is :

  1. In this way, for every group there will be single engine, which can be re-used.
  2. First user launching the engine is implemented to handle the scenarios where Hadoop groups are not the yarn users.
  3. For security concerns in 2nd point, we can have fine grained access control on session level, using Apache Ranger.

Types of changes :bookmark:

Test Results🧪

  1. When PREFERRED_GROUPS were not defined, it used the first group from the list Spark application name: kyuubi_GROUP_SPARK_SQL_Internet_Users_default_d2826b5b-3f1e-48a0-b42f-d248da914b7c application ID: application_1728291907264_43966 User: madlnu

  2. When valid PREFERRED_GROUPS were defined Spark application name: kyuubi_GROUP_SPARK_SQL_kyuubi_test_b_default_be9a16a8-be38-4ab6-bee9-1934f8556f18 application ID: application_1728291907264_43968 User: madlnu

  3. When no PREFERRED_GROUPS were matching, it used the first group from the list Spark application name: kyuubi_GROUP_SPARK_SQL_Internet_Users_default_d2826b5b-3f1e-48a0-b42f-d248da914b7c application ID: application_1728291907264_43966 User: madlnu

  4. When any other user X tries to access the existing GROUP engine, it uses the same engine Spark application name: kyuubi_GROUP_SPARK_SQL_Internet_Users_default_d2826b5b-3f1e-48a0-b42f-d248da914b7c application ID: application_1728291907264_43966 User: madlnu


Checklist 📝

Be nice. Be informative.

Madhukar525722 commented 3 weeks ago

Hi @pan3793 @bowenliang123 @turboFei . Please review the implementation. Thanks

codecov-commenter commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (d3520dd) to head (2c6f270).

Files with missing lines Patch % Lines
...rg/apache/kyuubi/session/HadoopGroupProvider.scala 0.00% 13 Missing :warning:
...in/scala/org/apache/kyuubi/config/KyuubiConf.scala 0.00% 6 Missing :warning:
...ain/scala/org/apache/kyuubi/engine/EngineRef.scala 0.00% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6779 +/- ## ====================================== Coverage 0.00% 0.00% ====================================== Files 687 687 Lines 42439 42459 +20 Branches 5793 5799 +6 ====================================== - Misses 42439 42459 +20 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bowenliang123 commented 3 weeks ago

To support preferred the group name in session conf, I would suggest to do the following:

  1. add a session conf key in KyuubiReservedKeys rather than in adding a server side config for a global preferred group (you are not even using its value anyway).
  2. add a server config for choosing the select policy, eg. 'head', 'prefered_session_conf'

And I have worries about using the session conf will interference choosing the engine reference, as it's easy to be changed at runtime. Is there any better approach with solid connection variables?

pan3793 commented 3 weeks ago

@Madhukar525722 thanks for taking care of this feature, I leave the comments to add a configuration "kyuubi.session.preferGroup" previously, but I have a little different idea now.

Now I would suggest having a "kyuubi.session.preferredGroups" (Seq[String]), when present, we select the most preferred group from the whole group list, otherwise, take the head, instead of failing fast. For the implementatio, we can use a custom Comparator to acheive that.

And I have worries about using the session conf will interference choosing the engine reference, as it's easy to be changed at runtime. Is there any better approach with solid connection variables?

seems there is no much differences from the existing properties like kyuubi.session.user?

Madhukar525722 commented 3 weeks ago

Hi @pan3793 , I have considered your suggestion for using kyuubi.session.preferredGroups. Please review.

Madhukar525722 commented 2 weeks ago

Hi @pan3793 , Please review the change, integration tests failed due to timeout. In previous it was green

pan3793 commented 1 week ago

@Madhukar525722 sorry for late reply, I'm quite busy these days, will take a look soon