dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.38k stars 154 forks source link

Sqlalchemy staging dataset support and docs #1841

Closed steinitzu closed 1 week ago

steinitzu commented 1 week ago

Description

Related Issues

Additional Context

netlify[bot] commented 1 week ago

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 7bdbfa2528f9742c53a511af5cac6906c30f8192
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66ec860bc8f585000808ced5
steinitzu commented 1 week ago

docs are looking good! Please enable tests for replace strategies which all start like this:

@pytest.mark.parametrize("replace_strategy", REPLACE_STRATEGIES)
def test_replace_disposition(
    destination_config: DestinationTestConfiguration, replace_strategy: str
) -> None:
    if not destination_config.supports_merge and replace_strategy != "truncate-and-insert":
        pytest.skip(
            f"Destination {destination_config.name} does not support merge and thus"
            f" {replace_strategy}"
        )

Ah I see, they're not running without merge. I saw the TODO to add supported_replace_strategies capability so I'll just add that.

also could you change one of the replace test to use write_disposition spec as dictionary? right now we use env variable to set the strategy. both should work:

@dlt.resource(name="items", write_disposition="replace", primary_key="id")
    def load_items():

Sure!

steinitzu commented 1 week ago

also could you change one of the replace test to use write_disposition spec as dictionary? right now we use env variable to set the strategy. both should work:

@dlt.resource(name="items", write_disposition="replace", primary_key="id") def load_items():

@rudolfix is this supported yet?
@dlt.resource(write_disposition=dict(disposition='replace', strategy='insert-from-staging')) doesn't seem to affect anything. For merge these table hints are set x-merge-strategy but I don't see anything equivalent for replace.
I'd leave this for another PR if it's not yet implemented.

rudolfix commented 1 week ago

also could you change one of the replace test to use write_disposition spec as dictionary? right now we use env variable to set the strategy. both should work:

@dlt.resource(name="items", write_disposition="replace", primary_key="id") def load_items():

@rudolfix is this supported yet? @dlt.resource(write_disposition=dict(disposition='replace', strategy='insert-from-staging')) doesn't seem to affect anything. For merge these table hints are set x-merge-strategy but I don't see anything equivalent for replace. I'd leave this for another PR if it's not yet implemented.

OK, right. we have this defined only for merge, but it is prepared for other dispositions

TWriteDisposition = Literal["skip", "append", "replace", "merge"]
TLoaderMergeStrategy = Literal["delete-insert", "scd2", "upsert"]

WRITE_DISPOSITIONS: Set[TWriteDisposition] = set(get_args(TWriteDisposition))
MERGE_STRATEGIES: Set[TLoaderMergeStrategy] = set(get_args(TLoaderMergeStrategy))

DEFAULT_VALIDITY_COLUMN_NAMES = ["_dlt_valid_from", "_dlt_valid_to"]
"""Default values for validity column names used in `scd2` merge strategy."""

class TWriteDispositionDict(TypedDict):
    disposition: TWriteDisposition

class TMergeDispositionDict(TWriteDispositionDict, total=False):
    strategy: Optional[TLoaderMergeStrategy]
    validity_column_names: Optional[List[str]]
    active_record_timestamp: Optional[TAnyDateTime]
    boundary_timestamp: Optional[TAnyDateTime]
    row_version_column_name: Optional[str]

TWriteDispositionConfig = Union[TWriteDisposition, TWriteDispositionDict, TMergeDispositionDict]

OK to move it to followup ticket