dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
220 stars 154 forks source link

[Regression] dbt seed is not accepting custom delimiter in the seed configs #1352

Closed roshravoof closed 3 weeks ago

roshravoof commented 1 month ago

Is this a new bug in dbt-core?

Current Behavior

dbt seed is not accepting custom/pipe delimiter in the seed configs

seeds:
  - name: mappings
    config:
      delimiter: '|'

Above seed config doesnt work in dbt version 1.7.18

Expected Behavior

dbt seed should accept any custom or multiple delimiters in the seed configs. dbt seed should be able to process comma and pipe delimited files in the same project.

Steps To Reproduce

setup dbt version 1.7.18 and python version 3.11

Setup seed config

seeds:
  - name: mappings
    config:
      delimiter: '|'

Relevant log output

No response

Environment

- Python: 3.11
- dbt: 1.7.18

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

dbeatty10 commented 1 month ago

Thanks for reporting this @roshravoof !

I was able to replicate what you described.

This only looks like it affects dbt-bigquery (and not dbt-postgres, dbt-snowflake, etc), so I'm going to transfer this issue to the dbt-bigquery repo.

Reprex

Create these files:

seeds/mappings.csv

id|alpha
1|A
2|B
3|C

seeds/_seeds.yml

seeds:
  - name: mappings
    config:
      delimiter: '|'

Run these commands:

dbt seed

See this output in dbt 1.6:

$ dbt seed
12:47:00  Running with dbt=1.6.5
12:47:34  Registered adapter: bigquery=1.6.9
12:47:34  Unable to do partial parsing because saved manifest not found. Starting full parse.
12:47:35  Found 1 model, 1 seed, 0 sources, 0 exposures, 0 metrics, 394 macros, 0 groups, 0 semantic models
12:47:35  
12:47:59  Concurrency: 10 threads (target='blue')
12:47:59  
12:47:59  1 of 1 START seed file dbt_dbeatty.mappings .................................... [RUN]
12:48:05  1 of 1 OK loaded seed file dbt_dbeatty.mappings ................................ [INSERT 3 in 5.95s]
12:48:05  
12:48:05  Finished running 1 seed in 0 hours 0 minutes and 30.38 seconds (30.38s).
12:48:05  
12:48:05  Completed successfully
12:48:05  
12:48:05  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

See this output in dbt 1.7 and 1.8:

$ dbt seed           
12:48:57  Running with dbt=1.7.11
12:48:59  Registered adapter: bigquery=1.7.8
12:48:59  Unable to do partial parsing because saved manifest not found. Starting full parse.
12:49:00  Found 1 model, 1 seed, 0 sources, 0 exposures, 0 metrics, 454 macros, 0 groups, 0 semantic models
12:49:00  
12:49:33  Concurrency: 10 threads (target='blue')
12:49:33  
12:49:33  1 of 1 START seed file dbt_dbeatty.mappings .................................... [RUN]
12:49:36  1 of 1 ERROR loading seed file dbt_dbeatty.mappings ............................ [ERROR in 3.50s]
12:49:36  
12:49:36  Finished running 1 seed in 0 hours 0 minutes and 36.08 seconds (36.08s).
12:49:36  
12:49:36  Completed with 1 error and 0 warnings:
12:49:36  
12:49:36    Runtime Error in seed mappings (seeds/mappings.csv)
  Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 0; errors: 3; max bad: 0; error percent: 0
  Error while reading data, error message: CSV table references column position 1, but line contains only 1 columns.; line_number: 2 byte_offset_to_start_of_line: 9 column_index: 1 column_name: "alpha" column_type: STRING
  Error while reading data, error message: CSV table references column position 1, but line contains only 1 columns.; line_number: 3 byte_offset_to_start_of_line: 13 column_index: 1 column_name: "alpha" column_type: STRING
  Error while reading data, error message: CSV table references column position 1, but line contains only 1 columns.; line_number: 4 byte_offset_to_start_of_line: 17 column_index: 1 column_name: "alpha" column_type: STRING
  You are loading data without specifying data format, data will be treated as CSV format by default. If this is not what you mean, please specify data format by --source_format.
12:49:36  
12:49:36  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
simbazzuk commented 1 month ago

@dbeatty10 is this still an issue as it works with the latest version. Anything I can do on this issue?

colin-rogers-dbt commented 1 month ago

Took a look at this, what's interesting is that without the it seems to work if you do it in dbt_project.yml like:

seeds:
  jaffle_shop:
      mappings:
           config:
              delimiter: '|'
colin-rogers-dbt commented 1 month ago

Will investigate what/how dbt-bigquery is handling this differently

colin-rogers-dbt commented 3 weeks ago

So after much investigating it's not clear exactly what broke this functionality in 1.7 but I can confirm it works in 1.6. This was already being fixed (see #1122) in the upcoming 1.9 release but we'll look at backporting to 1.8 as well