datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.81k stars 2.89k forks source link

etl job how to disable kerberos #381

Closed lonely7345 closed 7 years ago

lonely7345 commented 7 years ago

{ "wh_etl_job_name": "HIVE_DATASET_METADATA_ETL", "ref_id": 10001, "cron_expr": "* * * * * ?", "properties": { "hive.metastore.jdbc.url": "jdbc:mysql://localhost/hive", "hive.metastore.jdbc.driver": "com.mysql.jdbc.Driver", "hive.metastore.username": "root", "hive.metastore.password": "ttt", "hive.schema_json_file": "/home/schema.json", "hive.schema_csv_file":"/home/schema.csv", "hive.field_metadata":"/home/field.csv", "hive.hdfs_map_csv_file":"/home/hdfs.csv", "hdfs.namenode.ipc.uri":"hdfs://localhost:8020", "kerberos.auth":"false" }, "timeout": null, "next_run": null, "comments": "hive vm metadata etl" }

I use the above json to add a etl job . but when the job run , it's also to use kerberos.principal. why?

self.schema_url_helper = SchemaUrlHelper.SchemaUrlHelper(hdfs_namenode_ipc_uri, kerberos_auth, kerberos_principal, keytab_file)

the kerberos_auth is false. why it's also use kerberos?

msropp commented 7 years ago

@lonely7345, even with kerberos_auth set to false, we still have to define the other two kerberos variables as the empty string to avoid Kerberos errors:
"kerberos.principal": "", "kerberos.keytab.file": "", Perhaps, that might help you as a workaroud. Agree with your issue, however, that with it set to false, then no other variables should be required by the WhereHows code...