cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.17k stars 366 forks source link

`TypeError: 'unicode' does not have the buffer interface`, using hue on Databricks clusters with SQLAlchemy interface and the Hive connector #1046

Closed ebessah closed 4 years ago

ebessah commented 4 years ago

When connecting to Databricks clusters from Hue using SQLAlchemy interface and the Hive connector, we received TypeError: 'unicode' does not have the buffer interface.

After some days of debugging, we realised that the beeswax application which is installed and configured as part of hue and enabled you to perform queries on Apache Hive, had a custom autogenerated thrift python code that allows integration with HiveServer2. Because we were using the hive connector, anytime hue was about to establish connection to Databricks or run a statement, the custom thrift library tried to encode the SQL statements which failed with the TypeError as below:

[15/Feb/2020 17:11:22 -0800] sql_alchemy  ERROR    Query Error
Traceback (most recent call last):
  File "/usr/share/hue/desktop/libs/notebook/src/notebook/connectors/sql_alchemy.py", line 85, in decorator
    return func(*args, **kwargs)
  File "/usr/share/hue/desktop/libs/notebook/src/notebook/connectors/sql_alchemy.py", line 139, in execute
    connection = engine.connect()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2209, in connect
    return self._connection_cls(self, **kwargs)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 103, in __init__
    else engine.raw_connection()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2307, in raw_connection
    self.pool.unique_connection, _connection
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect
    return fn()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 303, in unique_connection
    return _ConnectionFairy._checkout(self)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 773, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
    rec = pool._do_get()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    self._dec_overflow()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
    return self._create_connection()
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 652, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 489, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/databricks_dbapi/databricks.py", line 61, in connect
    return hive.connect(database=database, thrift_transport=transport)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 94, in connect
    return Connection(*args, **kwargs)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 205, in __init__
    cursor.execute('USE `{}`'.format(database))
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/pyhive/hive.py", line 364, in execute
    response = self._connection.client.ExecuteStatement(req)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 298, in ExecuteStatement
    self.send_ExecuteStatement(req)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 305, in send_ExecuteStatement
    args.write(self._oprot)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/TCLIService.py", line 1882, in write
    self.req.write(oprot)
  File "/usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py", line 4460, in write
    oprot.writeBinary(self.statement)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 131, in writeBinary
    self.trans.write(str)
  File "/usr/share/hue/build/env/local/lib/python2.7/site-packages/thrift/transport/THttpClient.py", line 142, in write
    self.__wbuf.write(buf)
TypeError: 'unicode' does not have the buffer interface
[15/Feb/2020 17:11:22 -0800] decorators   ERROR    Error running execute

We understood clearly that line 4460 in /usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py did not handle properly the encoding of unicode string, when writing from string to binary; using python version 2.7.

Line 4460 in /usr/share/hue/apps/beeswax/gen-py/TCLIService/ttypes.py

oprot.writeString(self.statement)

Instead of:

oprot.writeString(self.statement.encode('utf-8') if sys.version_info[0] == 2 else self.statement)

To go around this problem, we upgraded the pip version which comes with Hue and then installed the databricks-dbapi[sqlalchemy] package which then installs other collected packages with a compatible and updated thrift library able to handle unicode encoding:

RUN ./build/env/bin/pip install --upgrade pip
RUN ./build/env/bin/pip install databricks-dbapi[sqlalchemy]

We go ahead to remove the native hue thrift library so that connection will fall over our newly installed thrift library.

RUN rm -rf /usr/share/hue/apps/beeswax/gen-py

The complete Dockerfile looks like this:

FROM gethue/hue:<latest-stable-hue-version>

USER root

RUN ./build/env/bin/pip install --upgrade pip
RUN ./build/env/bin/pip install databricks-dbapi[sqlalchemy]

ADD hue.ini /usr/share/hue/desktop/conf/z-hue.ini

# Remove custom hue thrift library 
RUN rm -rf /usr/share/hue/apps/beeswax/gen-py

EXPOSE 8888

How our hue.ini config looked like

So we used the hive interpreter in the Hue config which PyHive extends. This is what the databricks+pyhive dialect/driver which come by installing databricks-dbapi uses with SQLAlchemy to establish connection to Databricks.

[[[hive]]]
    name=Databricks
    interface=sqlalchemy
    options='{"url":"databricks+pyhive://token:<personal_token>@<host>:<port>/default","connect_args":{"cluster":"<cluster_name>"}}'
romainr commented 4 years ago

Thanks for the report! There is a if in the latest master code, https://github.com/cloudera/hue/blob/master/apps/beeswax/gen-py/TCLIService/ttypes.py#L4457 so the line to update to would be more like:

oprot.writeString(self.statement)

and you are proposing?

oprot.writeString(self.statement.encode('utf-8'))?

py-hive generated Thrift can conflict with the Hue one, so above clean-up is needed but we should find a proper way to avoid this.

ebessah commented 4 years ago

Using the latest master code, and looking at the problem again there seem to be a discrepancy between py-hive generated Thrift, using current thrift compiler (0.13.0) and the one hue maintains, generated with 0.9.3.

Any reason why hive is still maintaining an older version of thrift generated py-hive code?

Thrift version 0.9.3 compiler

if self.statement is not None:
      oprot.writeFieldBegin('statement', TType.STRING, 2)
      if sys.version_info[0] > 2:
        oprot.writeBinary(self.statement)
      else:
        oprot.writeString(self.statement) - Line 4457
      oprot.writeFieldEnd()

Thrift version 0.13.0 compiler

if self.statement is not None:
            oprot.writeFieldBegin('statement', TType.STRING, 2)
            oprot.writeString(self.statement.encode('utf-8') if sys.version_info[0] == 2 else self.statement) - Line 3967
            oprot.writeFieldEnd()
romainr commented 4 years ago

Hum indeed. IIRC no, just 0.9 is the default one coming via Ubuntu packages but it is very old now. Feel free to send a PR with the recompile with 0.13, or I can send one tomorrow!

romainr commented 4 years ago

https://issues.cloudera.org/browse/HUE-9175

ebessah commented 4 years ago

Perfect! Would do. Thanks for the response