Open yaroslav-ost opened 5 months ago
Hi, I think the function you want to use is "month" not "months".
Hi @aiss93,
It must be "months". See https://iceberg.apache.org/docs/latest/spark-ddl/#partitioned-by:~:text=Supported%20transformations%20are%3A
These names are used in the older versions of iceberg. Check your iceberg version. The error message you're getting says that the function you're using wasn't found on the catalog. I tried with "month" and I get the error : Transform is not supported
from my experience, it seems that the problem is with pyspark -
hour()
function used for extracting the hour component - that one should be used within sql queries, including the order by clause.
https://spark.apache.org/docs/3.2.1/api/python/reference/api/pyspark.sql.functions.hour.htmlhours()
function - used solely for partition transforms. should only be used in partition by obviously.
https://spark.apache.org/docs/3.2.1/api/python/reference/api/pyspark.sql.functions.hours.htmldbt-glue does not support this since it uses the same function in both order by and partition by.
Hi guys,
It seems we are not able to use Iceberg Partition Transforms inside dbt models.
Table Config: {{ config( materialized='incremental', incremental_strategy='append', file_format='iceberg', partition_by=['months(my_date)'] ) }}
Output: CREATE TABLE glue_catalog.test.test_table USING iceberg PARTITIONED BY (months(my_date)) LOCATION 's3://sandbox-source/client/my/test/test_table' AS SELECT * FROM tmp_test_table ORDER BY (months(my_date))
Error: AnalysisException: Undefined function: months. This function is neither a built-in/temporary function, nor a persistent function that is qualified as spark_catalog.default.months.
I guess ORDER BY is the root cause since iceberg doesn't support such kind of partition expressions in ORDER BY clause. https://github.com/aws-samples/dbt-glue/blob/a52fcc49f93fcb3405a60a0cd627ca8d91726749/dbt/adapters/glue/impl.py#L874-L882