dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.73k stars 1.48k forks source link

DuckDB connection_config parameter not being recognized #23670

Open EvanZ opened 3 months ago

EvanZ commented 3 months ago

Dagster version

dagster, version 1.7.16

What's the issue?

I am trying to create a DuckDBResource which can grab data from s3:

connection_config = {
    "s3_region": "us-west-2",
    "s3_access_key_id": "your_access_key_here",
    "s3_secret_access_key": "your_secret_key_here",
    # "s3_session_token": "your_session_token_here",  # Only include if necessary
}

print("DuckDB connection_config:", connection_config)

duckdb_resource = DuckDBResource(
    database="pipelines/duckdb_file.db",
    connection_config=connection_config
)

When I run my pipeline however I get the following error:

dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "load_data":

  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_plan.py", line 282, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 494, in core_dagster_event_sequence_for_step
    for user_event in _step_output_error_checked_user_event_sequence(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 183, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 88, in _process_asset_results_to_events
    for user_event in user_event_sequence:
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 198, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn, compute_context):
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 167, in _yield_compute_results
    for event in iterate_with_context(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_utils/__init__.py", line 475, in iterate_with_context
    return
  File "/usr/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 84, in op_execution_error_boundary
    raise error_cls(

The above exception was caused by the following exception:
duckdb.duckdb.InvalidInputException: Invalid Input Error: Unrecognized configuration property ",s3_secret_access_key"

  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_utils/__init__.py", line 473, in iterate_with_context
    next_output = next(iterator)
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 141, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 129, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
  File "/home/ubuntu/boosting-behavior-models/pipelines/assets/__init__.py", line 28, in load_data
    with duckdb.get_connection() as conn:
  File "/usr/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster_duckdb/resource.py", line 51, in get_connection
    conn = backoff(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_utils/backoff.py", line 57, in backoff
    return fn(*args, **kwargs)

It's this part: duckdb.duckdb.InvalidInputException: Invalid Input Error: Unrecognized configuration property ",s3_secret_access_key" that I can't seem to figure out what is happening. There's a weird extraneous comma that is not in my code.

What did you expect to happen?

I was expecting that the resource would work properly with connection_config specifying the s3 credentials.

How to reproduce?

Create a DuckDB resource with connection_config specifying s3 credentials and do a simple query to read data from s3.

Deployment type

Local

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

EvanZ commented 3 months ago

I want to add that even if I remove the 2 secret keys and just have s3_region, it still raises an error:

duckdb_resource = DuckDBResource(
        database="pipelines/duckdb_file.db",
        connection_config={
            "s3_region": "us-west-2",
            "threads":1
        }
    )
dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "load_data":

  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_plan.py", line 282, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 494, in core_dagster_event_sequence_for_step
    for user_event in _step_output_error_checked_user_event_sequence(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 183, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/execute_step.py", line 88, in _process_asset_results_to_events
    for user_event in user_event_sequence:
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 198, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn, compute_context):
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute.py", line 167, in _yield_compute_results
    for event in iterate_with_context(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_utils/__init__.py", line 475, in iterate_with_context
    return
  File "/usr/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 84, in op_execution_error_boundary
    raise error_cls(

The above exception was caused by the following exception:
duckdb.duckdb.InvalidInputException: Invalid Input Error: Unrecognized configuration property "s3_region"

  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_utils/__init__.py", line 473, in iterate_with_context
    next_output = next(iterator)
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 141, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_core/execution/plan/compute_generator.py", line 129, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
  File "/home/ubuntu/boosting-behavior-models/pipelines/assets/__init__.py", line 28, in load_data
    with duckdb.get_connection() as conn:
  File "/usr/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster_duckdb/resource.py", line 51, in get_connection
    conn = backoff(
  File "/home/ubuntu/boosting-behavior-models/.env/lib/python3.9/site-packages/dagster/_utils/backoff.py", line 57, in backoff
    return fn(*args, **kwargs)

If I just specify threads in the connection_config, I don't get this error, so it's something about trying to set the s3 config parameters as far as I can tell. I've tried hard coding them, environment variables etc. Nothing works inside the connection_config.

EvanZ commented 3 months ago

I should also note that I can create a persistent secret for s3 and that seems to work as well.