dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
10.73k stars 1.33k forks source link

Audit AssetMaterializations in integrations #3235

Closed schrockn closed 1 year ago

schrockn commented 3 years ago

We should make all the AssetMaterializations in our integrations end up producing sensible results in the asset catalog. E.g.:

@dagster_type_materializer(
    Selector(
        {
            "csv": {
                "path": StringSource,
                "sep": Field(StringSource, is_required=False, default_value=","),
            },
            "parquet": {"path": StringSource},
            "table": {"path": StringSource},
            "pickle": {"path": StringSource},
        },
    )
)
def dataframe_materializer(_context, config, pandas_df):
    check.inst_param(pandas_df, "pandas_df", pd.DataFrame)
    file_type, file_options = list(config.items())[0]

    if file_type == "csv":
        path = file_options["path"]
        pandas_df.to_csv(path, index=False, **dict_without_keys(file_options, "path"))
    elif file_type == "parquet":
        pandas_df.to_parquet(file_options["path"])
    elif file_type == "table":
        pandas_df.to_csv(file_options["path"], sep="\t", index=False)
    elif file_type == "pickle":
        pandas_df.to_pickle(file_options["path"])
    else:
        check.failed("Unsupported file_type {file_type}".format(file_type=file_type))

    return AssetMaterialization.file(file_options["path"])

in dagster_pandas

I'm actually not sure how, at this point, this asset materialization appears in the asset catalog but it probably doesn't make sense.

sryza commented 3 years ago

AssetMaterializations in integrations:

Starter thoughts on what to do with these:

sryza commented 1 year ago

Going to close this because our asset-related integrations now focus on software-defined assets instead of dynamic AssetMaterializations. Since this issue was filed, we added load_assets_from_dbt_project, define_dagstermill_asset, SDA-compatible IO managers for Pandas/PySpark/BigQuery/Snowflake/DuckDB, and made our S3/GCP/Azure IO managers SDA-compatible.