Generate Source Yaml doesn't preserve capitalization on schema/table names, creates unusable yaml for BQ (probably others)

ryantimjohn commented 10 months ago

Describe the bug

Steps to reproduce

Generate source yaml against a table with capitalized schema/table names, with generate a source yaml with lower-case schema and table names.

Expected results

Preserve capitalization, preferable as an true/false in the function call.

Actual results

Screenshot 2023-09-19 at 11 48 41 AM

Screenshot 2023-09-19 at 11 48 54 AM

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/codegen
    version: 0.10.0

Which database are you using dbt with?

[x] bigquery

The output of dbt --version: 1.3

The operating system you're using: Linux (dbt Cloud)

The output of python --version: dbt Cloud

Are you interested in contributing the fix?

Yes! Point me to the lines of code that do this and I'll happily contribute to a fix!

az-hienas commented 9 months ago

+1 to this for Snowflake with dbt 1.5.0.

If I have a column name in my source called firstName (note the capitalisation) generate_source produces YAML with the column firstname, which in Snowflake will be interpreted as FIRSTNAME since I did not give (nor is there an option to quote the column names for the source YAML).

As a result if I add a not_null test on the firstname column, e.g config

- name: firstname
  data_type: varchar
  description: ""
  tests: 
    - not_null

the test will fail with: 000904 (42000): SQL compilation error: ... invalid identifier 'FIRSTNAME'.

That's obviously because the Snowflake column is firstName and neither selecting FIRSTNAME nor firstname will work.

Perhaps by default the generated YAML should be with lower case column names, but have an option to quote things / preserve capitalisation , e.g. for my case would produce

- name: firstName
  data_type: varchar
  description: ""
  quote: true

Then I can manually add my not null test to the above config and everything will work.

I'm working with a Snowflake source table with around 150 columns in which the capitalisation is all over the place (all lower, all upper, camel case, title case and combinations of the aforementioned) and I can't control the case of the source column names.

In this case codegen doesn't really help a lot since I would have to go through 150 columns and make sure the case is ok and manually add quote: true

gwenwindflower commented 5 months ago

Makes sense that this should be a top-level option you can apply from codegen -- I'm going to be attempting some work to catch this package up based on the open issues -- thanks for sharing this! I'll keep you posted.

jenna-jordan commented 5 months ago

Adding my vote that this would be a very useful parameter for have - both for generate_source_yaml and generate_model_yaml. Thank you @gwenwindflower for digging into this!

gwenwindflower commented 4 months ago

there ended up being an open PR to do this, so i'm working with author of that to get these changes in, ideally covering the entire database -> schema -> table -> column chain in sources.

ronlut commented 1 month ago

@gwenwindflower Joining the party here - any plan/progress with supporting quote param per column? All our columns in SF are quotes hence we need to add quote: true to all generated source columns Thanks :)

Currently I am using sed to add that param when generating: sed -E "s/([[:space:]]+)data_type:/\1quote: true\n\1data_type:/g"

urkle commented 1 week ago

This also affects the generate_model_yaml when the model file names have mixed-case. It generates model yaml output with all lower-cased table and field names.

dbt-labs / dbt-codegen