databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

Update split part to use safe operation #839

Closed benc-db closed 3 weeks ago

benc-db commented 3 weeks ago

Description

Hit issue with internal testing; turns out that the split_part macro in dbt-spark uses unsafe indexing (which is a totally reasonable choice), but the adapter tests expect it to be safe to index beyond the number of matched parts. Switching to 'get' fixes the issue.

Checklist

jackyhu-db commented 3 weeks ago

is this a new file? If yes, where is it used?

benc-db commented 3 weeks ago

is this a new file? If yes, where is it used?

Hi Jacky. This is basically a copy-paste from the spark macro so that I can replace array indexing with the 'get' function. As such, I haven't actually validated the regex behavior, which apparently works since it passes the dbt-core tests, and no one is complaining about that. I'm just making it so that if you ask for index 3, and there are only 2 matches, you get NULL instead of raising an exception, since that is what is specified in the adapter tests from dbt.

benc-db commented 3 weeks ago

is this a new file? If yes, where is it used?

Hi Jacky. This is basically a copy-paste from the spark macro so that I can replace array indexing with the 'get' function. As such, I haven't actually validated the regex behavior, which apparently works since it passes the dbt-core tests, and no one is complaining about that. I'm just making it so that if you ask for index 3, and there are only 2 matches, you get NULL instead of raising an exception, since that is what is specified in the adapter tests from dbt.

I spoke with @mikealfare about bringing the change into dbt-spark, but he said that since this function is only in recent versions of spark, and unlike us, he can't presume a particular spark version, that for now the fix should be in dbt-databricks.