aws-samples / dbt-glue

This repository contains the dbt-glue adapter
Apache License 2.0
101 stars 69 forks source link

dbt seed failing when writing null values #430

Closed jausanca closed 2 months ago

jausanca commented 2 months ago

Describe the bug

dbt is failing when loading seeds with null values (empty values on the csv).

Steps To Reproduce

Execute dbt seed with csv seeds that contain rows with empty values

Screenshots and log output

Fragment from the error output:

NameError: name 'null' is not defined

System information

The output of dbt --version:

Core:
  - installed: 1.8.4
  - latest:    1.8.5 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - spark: 1.8.0 - Up to date!
  - glue:  1.8.1 - Up to date!

The output of python --version:

Python 3.10.12

Additional context

From a quick look i'd say it's related to how the table is serialized as a json and then passed as a dict when inserted into the code to execute on the session. Hence null values would be serialized as null instead of None.

...
    @available
    def create_csv_table(self, model, agate_table):
        session, client = self.get_connection()
        logger.debug(model)
        f = io.StringIO("")
        agate_table.to_json(f)
        if session.credentials.seed_mode == "overwrite":
            mode = "True"
        else:
            mode = "False"

        code = f'''
custom_glue_code_for_dbt_adapter
csv = {f.getvalue()}
df = spark.createDataFrame(csv)
table_name = '{model["schema"]}.{model["name"]}'
...
jausanca commented 2 months ago

I'm also getting the message module 'dbt.exceptions' has no attribute 'DbtDatabaseError' which seems right after looking at this commit on dbt-core. But I'm not sure if this is an issue related to my dbt version.