Closed willi-mueller closed 9 months ago
@willi-mueller there's hardcoded expression to create partitions. very old code :) with your repros it will be an easy fix thanks :) we'll double down on BigQuery resource adapter so various settings for partition and other column hints can be set in a simple way (like we do for vector databases with embeddings)
to fix we need to take into account the data type of the partition column in bigquery.py
. this also should be tested in query builder tests and in pipeline test (using test cases above)
Hope you don't mind, but I've added a date
test as well since technically above only tests for datetime:
@pytest.mark.parametrize(
"destination_config",
destinations_configs(all_staging_configs=True, subset=["bigquery"]),
ids=lambda x: x.name,
)
def test_bigquery_partition_by_date(destination_config: DestinationTestConfiguration) -> None:
pipeline = destination_config.setup_pipeline(f"bigquery_{uniq_id()}", full_refresh=True)
@dlt.resource(
write_disposition="merge",
primary_key="my_date_column",
columns={"my_date_column": {"data_type": "date", "partition": True, "nullable": False}},
)
def demo_resource() -> Iterator[Dict[str, Union[int, pendulum.DateTime]]]:
for i in range(10):
yield {
"my_date_column": pendulum.from_timestamp(1700784000 + i * 50_000).date(),
"metric": i,
}
@dlt.source(max_table_nesting=0)
def demo_source() -> DltResource:
return demo_resource
pipeline.run(demo_source())
I added this because BQ checks handles these differently and need different logic
dlt version
0.3.25 0.4.1a2
Describe the problem
When I specify a column to be a partition key in the BigQuery destination then we've been observing runtime errors. It seems that the dlt library does not create a valid DDL statement with a correct
partition by
specification.Expected behavior
When specifying
The resulting table is partitioned by date and the loading does not crash.
Steps to reproduce
Partition by Date
Run this pipeline trying to set a date column as partition key:
Code
Exception
Partition by Integer
Code
Exception
Operating system
macOS
Runtime environment
Local
Python version
3.11
dlt data source
not applicable
dlt destination
Google BigQuery
Other deployment details
No response
Additional information
No response