Open neuromantik33 opened 1 day ago
@Pipboyguy Update when I remove the BigQuery auto-detection then everything works fine.
...
bigquery_adapter(t_resource, partition=cursor_path) # autodetect_schema=True) # type: ignore[arg-type]
...
And FYI my current ugly workaround is doing my own table detection like so
def missing_destination_tables(pipeline: Pipeline, table_names: Iterable[str]) -> set[str]:
with pipeline.sql_client() as c:
def table_exists(table: str) -> bool:
q_name = c.make_qualified_table_name(table)
try:
c.execute_sql(f"SELECT 1 from {q_name}")
return True
except DestinationUndefinedEntity:
return False
return {table for table in table_names if not table_exists(table)}
...
if __name__ == "__main__":
from dlt import Pipeline
from nxt.config import __config__ as cfg
pipeline: Pipeline = dlt.pipeline(
pipeline_name="apply_db",
destination="bigquery",
staging="filesystem",
dataset_name=cfg.apply_db.dataset,
)
tables_to_append = missing_destination_tables(pipeline, tables_configs.keys())
print(f"Missing tables: {tables_to_append}")
source = apply_db(f"{cfg.apply_db.dsn}")
for t_name, rsrc in source.resources.items():
if t_name not in tables_to_append and tables_configs[t_name].get("cursor_path") is None:
rsrc.apply_hints(write_disposition="merge")
load_info = pipeline.run(source)
dlt version
1.0.0
Describe the problem
For some reason that escapes me, at one point I'm no longer able to load any data into BigQuery using a
write_disposition
other than 'append'. The reason for this is thatreplace
andmerge
both truncate some tables and for reasons unknown to me, dlt is longer able to detect that the tables don't exist and the pipeline fails.Expected behavior
The expected behavior is to be able to do an initial load in a non existing dataset using a write_disposition other than
append
.Steps to reproduce
Here is my test script:
and here is the expected output (I've also adding some additional logging to see the queries that are failing)
Operating system
Linux
Runtime environment
Local
Python version
3.11
dlt data source
sql_database
dlt destination
Google BigQuery
Other deployment details
If an change the write_disposition to
append
, everything works fine.output
Additional information
I can start removing all of the type hinting and bigquery tuning I can try to isolate the change that brought this, but I really need the options that I'm setting :sweat_smile: