dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
406 stars 228 forks source link

[Bug] Index errors when using split_part #1132

Open benc-db opened 3 weeks ago

benc-db commented 3 weeks ago

Is this a new bug in dbt-spark?

Current Behavior

When called with a part index that is out of bounds, and ansi-mode on, the split_part macro leads to an exception

Expected Behavior

Per the tests in BaseSplitPart in the adapter tests, the expectation is that this macro can be invoked with part indexes greater than the number of parts generated without throwing an exception specifically this row in the seed:

,|,,,,

We can accommodate this behavior by using get, rather than indexing the array, but only in Spark 3.4.0 or later.

Steps To Reproduce

  1. set spark.sql.ansi.enabled=true
  2. use split_part passing an out of bounds index
  3. observe exception

Relevant log output

No response

Environment

This issue has been in there for a while, but I'm just hitting it now due to new defaults in a Databricks environment I was asked to test against.

Additional Context

No response