datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.81k stars 2.89k forks source link

Error in ETL HiveTransform.py: IndexError: index out of range: 1 #1031

Closed richardxin closed 4 years ago

richardxin commented 6 years ago

Line 571 of HiveTransform.py (Tag v1.0.0)

2018-03-13 20:14:39 ERROR Job Launcher:91 - Traceback (most recent call last): File "", line 571, in IndexError: index out of range: 1

    at org.python.core.Py.IndexError(Py.java:274)
    at org.python.core.SequenceIndexDelegate.checkIdxAndGetItem(SequenceIndexDelegate.java:63)
    at org.python.core.PySequence.seq___getitem__(PySequence.java:377)
    at org.python.core.PySequence.__getitem__(PySequence.java:373)
    at org.python.pycode._pyx7.f$0(<iostream>:581)
    at org.python.pycode._pyx7.call_function(<iostream>)
    at org.python.core.PyTableCode.call(PyTableCode.java:167)
    at org.python.core.PyCode.call(PyCode.java:18)
    at org.python.core.Py.runCode(Py.java:1386)
    at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
    at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:291)
    at metadata.etl.dataset.hive.HiveMetadataEtl.transform(HiveMetadataEtl.java:57)
    at metadata.etl.EtlJob.run(EtlJob.java:182)
    at metadata.etl.Launcher.main(Launcher.java:86)

I added logging before calling transform in HiveMetadataEtl.Java:

2018-03-13 20:14:38 INFO m.e.d.h.HiveMetadataEtl:56 - before call scripts Transform.py['', {'kerberos.auth': 'False', 'hive.dependency_csv_file': '/var/tmp/hive_dependency.csv', 'wh.exec.id': '285', 'hive.metastore.jdbc.url': 'jdbc:mysql://[host_name_here]:3306/hive', 'hdfs.namenode.ipc.uri': 'your_namenode_ipc_uri', 'whEtlId': '285', 'hive.metastore.password': '[password_here]', 'job.ref.id': '1', 'wherehows.db.jdbc.url': 'jdbc:mysql://[host_name_here]:3306/wherehows', 'hive.hdfs_map_csv_file': '/var/tmp/hive_hdfs_map.csv', 'hive.field_metadata': '/var/tmp/hive_field_metadata.csv', 'job.timeout': '12000', 'innodb_lock_wait_timeout': '1500', 'wherehows.app_folder': '/var/tmp/wherehows', 'hive.instance_csv_file': '/var/tmp/hive_instance.csv', 'hive.schema_csv_file': '/var/tmp/hive_schema.csv', 'hive.metastore.reconnect.time': 'your_reconnect_interval_seconds', 'krb5.kdc': 'your_kdc', 'hive.schema_json_file': '/var/tmp/hive_schema.json', 'wherehows.db.username': 'wherehows', 'wherehows.db.driver': 'com.mysql.jdbc.Driver', 'db.id': '1', 'hive.metastore.username': 'master', 'kerberos.principal': 'your_principal', 'hive.metastore.jdbc.driver': 'com.mysql.jdbc.Driver', 'hive.database_white_list': 'hive', 'krb5.realm': 'your_realm', 'job.cron.expr': '0 0/15 * ?', 'kerberos.keytab.file': 'your_keytab_file', 'hive.database_black_list': 'your_databsae_black_list', 'job.class': 'metadata.etl.dataset.hive.HiveMetadataEtl', 'wherehows.db.password': 'wherehows'}]

richardxin commented 6 years ago

and this error seems to be sporadic

keremsahin1 commented 4 years ago

Dear issue owner,

Thanks for your interest in WhereHows. We have recently announced DataHub which is the rebranding of WhereHows. LinkedIn improved the architecture of WhereHows and rebranded WhereHows into DataHub and replaced its metadata infrastructure in this direction. DataHub is a more advanced and improved metadata management product compared to WhereHows.

Unfortunately, we have to stop supporting WhereHows to better focus on DataHub and offer more help to DataHub users. Therefore, we will drop all issues related to WhereHows and will not accept any contribution for it. Active development for DataHub has already started on datahub branch and will continue to live in there until it's finally merged to master and project is renamed to DataHub.

Please check the datahub branch to get familar with DataHub.

Best, DataHub team