apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.86k stars 1.77k forks source link

[Bug] [Hive] Hive job fails with error KerberosName$NoMatchingRule: No rules applied to xyz@HADOOP.COM #7676

Open arshadmohammad opened 2 weeks ago

arshadmohammad commented 2 weeks ago

Search before asking

What happened

Executing a job, containing hive a source, fails with errro Hive job fails with error KerberosName$NoMatchingRule: No rules applied to xyz@HADOOP.COM.

SeaTunnel Version

2.3.7

SeaTunnel Config

source {
Hive {
    parallelism=1
    "connection_check_timeout_sec"=30
    "use_select_count"="false"
    "skip_analyze"="false"
    "split.size"=8096
    "split.even-distribution.factor.upper-bound"=100
    "split.even-distribution.factor.lower-bound"=0.05
    "split.sample-sharding.threshold"=1000
    "split.inverse-sampling.rate"=1000
    "result_table_name"=Table14982155829824
    "table_name"="hive_table.my_hive_db"
    "kerberos_krb5_conf_path"="/some/path/krb5.conf"
    "hive_site_path"="/some/path/hive-site.xml"
    "hive.hadoop.conf" {
        "hive.files.ignore.invalid.path"="true"
    }
    "metastore_uri"="thrift://host.domain.com:9083"
    "kerberos_keytab_path"="/some/path/service.keytab"
    "kerberos_principal"="service@HADOOP.COM"
    "hive.hadoop.conf-path"="/some/path"
    "hdfs_site_path"="/some/path/hdfs-site.xml"
}
}

Running Command

sh bin/seatunnel.sh --config ../seatunnel-web/profile/14905258453056.conf

Error Exception

Caused by: org.apache.hadoop.security.KerberosAuthException: failure to login: for principal:

Zeta or Flink or Spark Version

Zeta

Java or Scala Version

Java

Screenshots

No response

Are you willing to submit PR?

Code of Conduct

arshadmohammad commented 2 weeks ago

There are two main issues with HiveMetaStoreProxy in this case:

  1. The hadoop.security.auth_to_local configuration is typically set in core-site.xml, but only hive-site.xml is read from the hive.hadoop.conf-path configuration. Ideally, all relevant site.xml files should be read.
  2. When connecting to Kerberos, the Configuration object is not being populated correctly.