Closed ConstantinoSchillebeeckx closed 3 years ago
I'm not a databricks engineer so I'm not familiar with their internals but my guess is that databricks uses a fork of hive and made a frivolous change to the table description output header for the partitions.
In hive, the value is hardcoded here: https://github.com/apache/hive/blob/fc2d47f85e03e9f1a2f79df34b826640062bbf6d/ql/src/java/org/apache/hadoop/hive/ql/ddl/table/info/desc/formatter/TextDescTableFormatter.java#L146
I think we can just override the get_columns
method in the DatabricksDialect
here to to read:
if col_name in ('# Partition Information', '# Partitioning'):
which would make it work for both Databricks' hive and vanilla hive.
I need to make a few more changes and will provide a new release later today.
Hi there!
I'm going through your example of using this with SQLAlchemy. I've got a Databricks cluster spun up, have generated a PAT, and am able to successfully connect. However, whenever I try to do something like
I get
Digging a little bit, I've discovered that the return of![image](https://user-images.githubusercontent.com/8518288/104103475-ea3e6980-5267-11eb-8b21-8700ec0998f4.png)
DESCRIBE my_table
returns something like:whereas this line tells me it's expecting to break on "# Partition Information". Therefore
col_type
ends up beingNone
when there.search
is being run.What am I missing? I've inherited this code, and I'm new to whole pyspark/hive/etc space so I'm not sure this library is even the culprit. Please let me know what other information I can provide. FWIW here's my
pip freeze