aws-samples / dbt-glue

This repository contains the dbt-glue adapter
Apache License 2.0
101 stars 69 forks source link

Specifying column_types doesn't work on seeds. #458

Open jausanca opened 1 month ago

jausanca commented 1 month ago

Describe the bug

When executing dbt seed, the column types specified on the seed properties are not properly applied, they are set to string instead. Also it gives an error where it says it can infer type if a column is empty, even when specified.

Steps To Reproduce

Execute dbt seed with the column types specified on the .yml properties

Expected behavior

Specified properties should be applied to the columns.

Actual behaviour

String type is applied to columns

System information

The output of dbt --version:

Core:
  - installed: 1.8.7
  - latest:    1.8.7 - Up to date!

Plugins:
  - spark: 1.8.0 - Up to date!

The output of python --version:

Python 3.10.12

Additional context

Looks like the issue is that when a column_type is specified in core is casted as a string so the actual casting is handled by the adapter. Also when the csv data is passed via the executed statement, the spark dataframe is created with no schema, causing it to fail if the column is empty, also some datatypes such as dates are lost on this conversion.