Open mikealfare opened 2 months ago
@benc-db This is the issue we were talking about yesterday about the issues with the Databricks Metadata API. Is this just a Databricks specific issue?
It is Databricks specific, but may affect dbt-spark as well.
lol, I didn't see where I was commenting. So, I do not know the extent to which describe extended is standard Spark vs Databricks, which is probably what you're asking here.
@benc-db yup :)
@mikealfare did you find this bug running on Databricks then?
@amychen1776 Apologies for the late reply; my GH notifications have been out of control. I believe this was reported by a Cloud customer that was running dbt-spark
with Databricks.
I'll summarize here what I'm doing in dbt-databricks: in 1.9 I'm introducing a behavior flag to use information schema to get column types for UC tables. The reason I'm guarding with a flag is because I learned in testing that information schema is not always synced up with reality, and to ensure that it is, I have run a repair table operation before gathering columns. This adds overhead. I'm hopeful that I can remove the flag when sync gets better for information schema, because in my testing, I hit columns missing between successive dbt runs that took on the order of minutes...too long for me to feel comfortable about trusting it for this.
Hi, not sure if I encountered the same issue. I got runtime error when adding a struct column to an incremental model on dbt-spark. Here's the error.
Runtime Error
[PARSE_SYNTAX_ERROR] Syntax error at or near ','.(line 7, pos 34)
== SQL ==
/* {"app": "dbt", "dbt_version": "1.8.6", "profile_name": "main_spark", "target_name": "dev", "node_id": "model.main.evens_only"} */
alter table test_db.evens_only_spark
add columns
struct_test struct<,... 1 more fields>
----------------------------------^^^
It seems the data type read in parse_describe_extended
func is [<agate.Row: ('id', 'int', None)>, <agate.Row: ('struct_test', 'struct<,... 1 more fields>', None)>]
. Don't know why the struct type doesn't show the internal fields.
This impacts unit testing as well. I can't provide test values for my complex type because the ,... $N more fields>
artifact gets compiled into the generated cast
statement.
I'm not using Databricks.
Is this a new bug in dbt-spark?
Current Behavior
Complex types are truncated when running this macro: https://github.com/dbt-labs/dbt-spark/blob/3fc624cb99488e803956304c9dea2c10facab08d/dbt/include/spark/macros/adapters.sql#L281-L286
This happens due to
DESCRIBE EXTENDED
, which truncates the results before returning them.Expected Behavior
The types should be complete.
Steps To Reproduce
DESCRIBE EXTENDED my_model
Relevant log output
No response
Environment
Additional Context
No response