cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.17k stars 366 forks source link

Hue 4.9 - Hive editor seems to fail while fetching table info from the metastore #1997

Closed elukey closed 3 years ago

elukey commented 3 years ago

Is the issue already present in https://github.com/cloudera/hue/issues or discussed in the forum https://discourse.gethue.com?

Not that I can see :)

Describe the bug:

When logging in the Hive editor doesn't show the preview of the available Hive tables for a given database. It seems also not possible to get a valid Hive session (at least according to the UI), even if I try to refresh it. No errors logged on the Hive side afaics in the logs. This error is present in the Hue's error.log, and a preview of it pops up also in the UI:

[12/Apr/2021 08:15:11 +0000] api          ERROR    Autocomplete data fetching error
Traceback (most recent call last):
  File "/usr/lib/hue/apps/beeswax/src/beeswax/api.py", line 117, in _autocomplete
    tables_meta = db.get_tables_meta(database=database)
  File "/usr/lib/hue/apps/beeswax/src/beeswax/server/dbms.py", line 374, in get_tables_meta
    tables = self.client.get_tables_meta(database, identifier)
  File "/usr/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 1508, in get_tables_meta
    tables = self._client.get_tables_meta(database, table_names, table_types)
  File "/usr/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 850, in get_tables_meta
    (res, session) = self.call(self._client.GetTables, req)
  File "/usr/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 728, in call
    return self.call_return_result_and_session(fn, req, status, session=session)
  File "/usr/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 767, in call_return_result_and_session
    return self._call_return_result_and_session(fn, req, status=status, session=session)
  File "/usr/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 790, in _call_return_result_and_session
    raise QueryServerException(Exception(message), message=message)
beeswax.server.dbms.QueryServerException: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException Cannot set Object par
ameter: value = MANAGED_TABLE for column "" : Could not set parameter at position 1 (values was 'MANAGED_TABLE')
Query - conn:28756(M)  - "SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,A0.CREATE_TIME,A0.LAST_ACCESS_TIME,A0.OWNER,A
0.RETENTION,A0.IS_REWRITE_ENABLED,A0.TBL_NAME,A0.TBL_TYPE,A0.TBL_ID FROM TBLS A0 LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID WHERE LOWER(B0.`NAME`) L
IKE 'default' ESCAPE '\\' AND ((((A0.TBL_TYPE = ?) OR (A0.TBL_TYPE = ?)) OR (A0.TBL_TYPE = ?)) OR (A0.TBL_TYPE = ?))")
[12/Apr/2021 08:15:39 +0000] decorators   ERROR    Error running close_session
Traceback (most recent call last):
  File "/usr/lib/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 98, in decorator
    return func(*args, **kwargs)
  File "/usr/lib/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 255, in close_session
    session_record = dbms.get(self.user, query_server).close_session(session_record)
  File "/usr/lib/hue/apps/beeswax/src/beeswax/server/dbms.py", line 562, in close_session
    raise QueryServerException(_('Failed to close session, session handle may already be closed or timed out.'))
beeswax.server.dbms.QueryServerException: Failed to close session, session handle may already be closed or timed out.

Steps to reproduce it?

Run Hue 4.9 vanilla with Hive 2.3.6 (Apache Bigtop).

Hue version or source? (e.g. open source 4.5, CDH 5.16, CDP 1.0...). System info (e.g. OS, Browser...).

opensource 4.9

elukey commented 3 years ago

I am using thrift_version=7 in the hue.ini settings, 11 doesn't work.

elukey commented 3 years ago

It is interesting that I can see the list of Hive databases in the Hue editor, but I cannot explore any of them.

romainr commented 3 years ago

So v7 works? Using an old Hive?

elukey commented 3 years ago

@romainr v7 leads to the error above, but somehow it gets to the list databases part correctly. When it gets to listing the tables then it fails. To be noted that 4.8 doesn't have this issue, so I suspect something changed for the new release?

Also Hive 2.3.6 is not latest upstream (IIUC 3.x series) but it is not really old :)

elukey commented 3 years ago

One suspicious log that I noticed is:

thrift_util  WARNING  Unable to unpack the secret and guid in Thrift Handle: unpack requires a buffer of 16 bytes
elukey commented 3 years ago

The issue is not related to the upgrade, I re-tested 4.8 and I had the same problem. Interestingly, with 4.8 I spotted the following errors in the hive metastore logs (with 4.9 I see only logs on the hue side):

java.sql.SQLException: Could not set parameter at position 1 (values was 'MANAGED_TABLE')
Query - conn:30368(M)  - "SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,A0.C
REATE_TIME,A0.LAST_ACCESS_TIME,A0.OWNER,A0.RETENTION,A0.IS_REWRITE_ENABLED,A0.TBL_NAME,A0.TBL_TYPE,A0.TBL_ID F
ROM TBLS A0 LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID WHERE LOWER(B0.`NAME`) LIKE 'default' ESCAPE '\\' AN
D ((((A0.TBL_TYPE = ?) OR (A0.TBL_TYPE = ?)) OR (A0.TBL_TYPE = ?)) OR (A0.TBL_TYPE = ?))"
        at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getSqlException(ExceptionMapper.java:271)
        at org.mariadb.jdbc.MariaDbPreparedStatementClient.setParameter(MariaDbPreparedStatementClient.java:480)
        at org.mariadb.jdbc.BasePrepareStatement.setString(BasePrepareStatement.java:1533)
        at com.jolbox.bonecp.PreparedStatementHandle.setString(PreparedStatementHandle.java:1013)
        at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.setString(ParamLoggingPreparedStatement.java:885)
        at org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setObject(CharRDBMSMapping.java:424)
        at org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setObject(SingleFieldMapping.java:190)
        at org.datanucleus.store.rdbms.sql.SQLStatementHelper.applyParametersToStatement(SQLStatementHelper.java:248)
        at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:610)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1855)
        at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
        at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368)
        at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267)
        at org.apache.hadoop.hive.metastore.ObjectStore.getTableMeta(ObjectStore.java:1265)
        at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
        at com.sun.proxy.$Proxy25.getTableMeta(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_meta(HiveMetaStore.java:2006)
        at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy26.get_table_meta(Unknown Source)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_meta.getResult(ThriftHiveMetastore.java:11429)

In the test environment where I am testing Hue 4.9 I have also upgraded the Hive server/metastore host to Debian 10, and part of the upgrade involved also the change of the Hive JDBC protocol (from server to metastore) to the mariadb one, since it is the only available in the new version of the OS (namely the jar containing the drivers is only shipped for Mariadb). Everything works fine, but this particular Hue query might not be something that the mariadb JDBC connector likes? (We do use mariadb as database from a long time ago).

elukey commented 3 years ago

My impression is that Hue tries to get the metadata about the hive tables from the hive server, that in turn issues a query to the metastore like the SELECT DISTINCT above, that fails using the mariadb JDBC driver (instead of com.mysql.jdbc.Driver). I haven't seen a similar issue with all the other tools that I am testing (like oozie, spark, hive itself, presto, etc..), Hue is the only one leading to this. Any help or suggestion?

romainr commented 3 years ago

Does it fail for all the tables or only the managed ones? Seems like a but in the Thrift compat.

PS: thrift_util WARNING Unable to unpack the secret and guid in Thrift Handle: unpack requires a buffer of 16 bytes might be one issue fixed recently with Python 3, would recommend to quick test with master branch until the 4.10 release comes out

elukey commented 3 years ago

@romainr thanks a lot for the brainbounce, the issue seems to be related to the jdbc org.mariadb.jdbc.Driver class when used for the Hive metastore (Debian Buster doesn't provide libmysql-java anymore). So not related to Hue, closing!