apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Bug] Kyuubi Spark Authz plugin can't sync the policy from Apache Ranger #3791

Closed xiaozhch5 closed 1 year ago

xiaozhch5 commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

Hello, I used Apache Ranger to control the access policy of the Hadoop cluster, and kyuubi sync the policy of hive. When I config the properties using the instroduction of "https://kyuubi.apache.org/docs/latest/security/authorization/spark/install.html", some problem arised: 3fef83d927220022c6819e0a90590d1 The log shows that, It's not certified by Ranger Admin.

So I changed the code of org.apache.ranger.admin.client.RangerAdminRESTClient, add username and password to be authenticated by Ranger Admin.

3bbda584715eddaaaaf566a5c2fa0b3

8b9dc4ec8eb2e49c3db83508aeac6f3

Both on client mode and cluster mode, kyuubi spark can successful sync the policy from Ranger Admin.

Affects Version(s)

1.6.0

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 1 year ago

Hello @xiaozhch5, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi (Incubating).

pan3793 commented 1 year ago

It does not sound like a Kyuubi side issue, and I'm even not sure if it's a bug or caused by the wrong usage. The 401 response indicates that there are some issues with authentication/authorization.

xiaozhch5 commented 1 year ago

It does not sound like a Kyuubi side issue, and I'm even not sure if it's a bug or caused by the wrong usage. The 401 response indicates that there are some issues with authentication/authorization.

  • have you checked the Ranger server log?
  • what are your configurations of each component?

It may be a configuration isssue in configing Kyuubi Spark AuthZ Plugin.

At presenet, according to the instroduction of the document in "https://kyuubi.apache.org/docs/latest/security/authorization/spark/install.html#ranger-spark-security-xml", the property of 'ranger.plugin.spark.policy.source.impl' is set to 'org.apache.ranger.admin.client.RangerAdminRESTClient'.

1667984835900

When spark sync the hive policy from Ranger Admin, it shoud use the username and password to authenticate the url http://ranger-admin.org:6080.

But by default, if the property of 'ranger.plugin.spark.policy.source.impl' is set to 'org.apache.ranger.admin.client.RangerAdminRESTClient'. It doesn't authenticate with username and password. Therefore, I got the 401 response.

So, I change some code of 'org.apache.ranger.admin.client.RangerAdminRESTClient', and pass username and password to it so that spark can successful sync the policy from Ranger Admin and pass the authentication of the url of http://ranger-admin.org:6080.

Maybe the instroduction of the document in "https://kyuubi.apache.org/docs/latest/security/authorization/spark/install.html#ranger-spark-security-xml" is not so complete?

xiaozhch5 commented 1 year ago

full content of ranger-spark-security.xml

<configuration>
    <property>
        <name>ranger.plugin.spark.service.name</name>
        <value>cluster_hive</value>
        <description>Name of the Ranger service containing policies for this SPARK instance</description>
    </property>
    <property>
        <name>ranger.plugin.spark.policy.source.impl</name>
        <value>org.apache.ranger.admin.client.RangerAdminRESTClient2</value>
        <description>Class to retrieve policies from the source</description>
    </property>
    <property>
        <name>ranger.plugin.spark.policy.rest.url</name>
        <value>http://host121:6080</value>
        <description>URL to Ranger Admin</description>
    </property>
    <property>
        <name>ranger.plugin.spark.policy.rest.url.username</name>
        <value>admin</value>
        <description>username to Ranger Admin</description>
    </property>
    <property>
        <name>ranger.plugin.spark.policy.rest.url.password</name>
        <value>admin123#</value>
        <description>password to Ranger Admin</description>
    </property>
    <property>
        <name>ranger.plugin.spark.policy.pollIntervalMs</name>
        <value>30000</value>
        <description>How often to poll for changes in policies?</description>
    </property>
    <property>
        <name>ranger.plugin.spark.policy.cache.dir</name>
        <value>/etc/ranger/cluster_spark/policycache</value>
        <description>Directory where Ranger policies are cached after successful retrieval from the source</description>
    </property>
</configuration>
pan3793 commented 1 year ago

When spark sync the hive policy from Ranger Admin, it should use the username and password to authenticate ...

WHY IT SHOULD use the username and password?

xiaozhch5 commented 1 year ago

When spark sync the hive policy from Ranger Admin, it should use the username and password to authenticate ...

WHY IT SHOULD use the username and password?

So that spark can successful sync the policy. Like using curl tool, it's ok to get the policy use the username and password.

image

But I get nothing without the username and password.

image

pan3793 commented 1 year ago

So the key point here is the ranger does not expose the API to setup basic authentication information, then Kyuubi can not do that(maybe we can use reflection but it's may be a dirty way). Then it's not a Kyuubi bug, you'd better to push the Ranger community to expose the new API so that the downstream project can leverage.

xiaozhch5 commented 1 year ago

So the key point here is the ranger does not expose the API to setup basic authentication information, then Kyuubi can not do that(maybe we can use reflection but it's may be a dirty way). Then it's not a Kyuubi bug, you'd better to push the Ranger community to expose the new API so that the downstream project can leverage.

Thank you too much for your reply, I will close this issue and raise it in the Ranger community.

xiaozhch5 commented 1 year ago

@pan3793 Hello, I test again and put

      <property>
      <name>ranger.credential.provider.path</name>
      <value>/etc/ranger/admin/rangeradmin.jceks</value>
    </property>

into ranger-spark-security.xml and spark works well to sync policies from ranger admin.

Maybe we can improve the document of "https://kyuubi.apache.org/docs/latest/security/authorization/spark/install.html" and add the property list before to ranger-spark-security.xml.

pan3793 commented 1 year ago

This configuration seems tightly depends on your environment.

risyomei commented 1 year ago

I am having this issue too. Normally Ranger policy will be synced with GSSAPI (Kerberos authentication), not sure if Kyuubi plans to support that.

pan3793 commented 1 year ago

@risyomei the original reporter was talking about basic authentication but you are talking about GSSAPI(Kerberos) authentication, are you SURE you are meeting the same issue?

risyomei commented 1 year ago

Even in a fully-kerberized cluster, we see authz is trying to use simple authentication when accessing the ranger admin and failed with the same error message.

Therefore, I think there are two ways of resolving this issue:

  1. Using the credential store as @xiaozhch5 suggests in the comment (https://github.com/apache/kyuubi/issues/3791#issuecomment-1315249377).
  2. Using the Kerberos (GSSAPI) to access the Ranger. <-- That's the thing I am not sure if Kyuubi plans to support.
pan3793 commented 1 year ago

... Using the Kerberos (GSSAPI) to access the Ranger. <-- That's the thing I am not sure if Kyuubi plans to support.

It already done, but w/ some limitations, e.g. it does not work on spark cluster mode if you use --proxy-user to submit spark engines.

risyomei commented 1 year ago

Thank you very much for your confirmation, that's exactly what I wanted to know. I was having problem with Spark cluster mode + Kyuubi LDAP Authentication.

Is there any way to use Spark cluster mode + Kyuubi LDAP Authentication + authz at the moment?

pan3793 commented 1 year ago

I explained the technical details on https://github.com/apache/kyuubi/discussions/3620, it also mentioned some workaround, and approach to support it directly(looking forward volunteer to send PR)

risyomei commented 1 year ago

@pan3793 Thank you very much for the explanation. Totally answered my question! 🙇

isteven-xu commented 1 year ago

I got the same issue like you, but its not working when change some code to add username and password , I repleaced the ranger-plugin-common-2.3.0.jar in spark classpath and then also got 401 ,but I tested with curl like : curl -v -u admin:admin http://ip:6080/service/plugins/secure/policies/download/spark_service works fine.

isteven-xu commented 1 year ago

I changed the code and it works.

image