dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
297 stars 119 forks source link

Parquet Example #80

Closed yapnel closed 2 years ago

yapnel commented 3 years ago

Any examples of creating external table for parquet for spark? I'm struggling to define the specifications. Any help much appreciated

Thanks

jtcohen6 commented 3 years ago

This isn't something I've actually done (parquet + spark + dbt-external-tables), but I think it ought to be possible. If someone else has had experience doing this, I'd love to hear about it!

pgoslatara commented 2 years ago

@yapnel Did you figure out how to do this?

I resorted to querying the parquet files directly, this is my sources.yml:

sources:
  - name: parquet_source
    schema: parquet

    tables:
      - name: parquet_table_1
        identifier: '`dbfs:///mnt/landing/Source=Parquet/Table=parquet_table_1/`'

I can then refer to this directory of parquet files via:

SELECT * FROM {{ source('parquet_source', 'parquet_table_1') }}