dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

DBT Seed file doesnot support commas in country code names #2692

Closed chaitanyabheri closed 4 years ago

chaitanyabheri commented 4 years ago

Describe the bug

While trying to load country codes list using seed files, seed file processing fails with comma values in country names.

Steps To Reproduce

Create country_codes.csv file with country_name column

country_code,country_name FM,Micronesia,Federated States of

Expected behavior

Insert data into table

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem. image

System information

Which database are you using dbt with?

The output of dbt --version:

0.17.0

The operating system you're using:

The output of python --3.6:

Additional context

Add any other context about the problem here.

jtcohen6 commented 4 years ago

Hey @chaitanyabheri, this isn't a bug with dbt seed so much as a limitation of the CSV file format. Generally speaking, commas are field delimiters; if a field contains a comma, that comma needs to be escaped, or the field enclosed, by way of additional characters.

For the CSV inference that dbt seed does (using the agate library), the field needs to be enclosed in quotation marks:

country_code,country_name
FM,"Micronesia,Federated States of"

Give that a go, and feel free to comment here if you're still running into issues.

timothyjang123 commented 2 years ago

Hey @jtcohen6, I stumbled on this thread and am running into the same thing and noticed some pretty undesirable behavior from the agate side. It looks like if we have a quoted value, we can't put spaces around it or it won't parse correctly.

For example: country_code,country_name FM, "Micronesia,Federated States of"

The above would fail because of the single space before the value of country_name. I don't think there's an action-item here from the dbt side, just wanted to point this out to hopefully save someone some time..

jens-rob commented 2 years ago

dbt will include the quote marks as characters in the values when it inserts the data if you quote your strings.

core version: 1.1.2 dbt-sqlserver version: 1.1.0

Maclenn77 commented 1 month ago

Updating from dbt core version 1.0.0 to 1.87

It doesn't seem to correctly handling commas inside quotation marks. As an example, csv row has "Bogota, Lima, Peru" but it only shows "Bogota and doesn't show following text. At least it's not failing

image

Maclenn77 commented 1 month ago

Okay, self-serviced. @timothyjang123 it's right. If you let a blank space, agate won't parse correctly quotation marks.