fal-ai / dbt-fal

do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
https://fal.ai/dbt-fal
Apache License 2.0
853 stars 72 forks source link

Option to set custom dtype values for models using postgres adapter #768

Open kinghuang opened 1 year ago

kinghuang commented 1 year ago

Context

I often work with PostgreSQL jsonb columns in my dbt models. With dbt-fal and a base postgres profile, there isn't a way to pass dtype options to SQLAlchemy, making it impossible to generate jsonb and similar column types in the resulting table.

Is your feature request related to a problem? Please describe.

In my test Python model, one of the dataframe columns contains JSON serialized data. The resulting table in PostgreSQL has this column as a text type. But, what I want is for the database to store that data as a jsonb type. I need a way to specify the desired type to the dataframe.to_sql call that dbt-fal performs.

Describe the solution you'd like

Perhaps a config option?

import sqlalchemy.types as st

def model(dbt, fal):
    dbt.config(fal_to_sql_dtype={"column_name": st.JSON})
    …

postgres.write_df_to_relation would then pass this to data.to_sql.

dtype = ???  # Get the config here

rows_affected = data.to_sql(
    con=alchemy_engine,
    name=temp_relation.identifier,
    schema=temp_relation.schema,
    if_exists=if_exists,
    index=False,
    method=_psql_insert_copy,
        dtype=dtype,
)

Describe alternatives you've considered

I tried to use a dbt post-hook config on the model to alter the column type. But, my post-hook statements never get called. I'm not sure why this is.

version: 2
models:
  - name: model_name_here
    config:
      post-hook:
        - "ALTER TABLE {{ this }} ALTER COLUMN index_sequences SET DATA TYPE jsonb USING index_sequences::jsonb"

Additional context

This is different than specifying dtypes on the dataframe columns themselves. i.e., the data is correctly treated as object/string in the dataframe. It's when then dataframe is materialized in SQL that the JSON type applies.

Is there an existing feature request for this?