Closed VasuBalakrishnan closed 1 year ago
@VasuBalakrishnan Thanks for opening, and for bringing my attention to the NULL_IF
format parameter.
Instead of reinventing the wheel with our own SQL logic, it sounds like you could achieve the desired behavior via:
external:
file_format: "(
type = csv
null_if = (null', '', 'n/a')
)"
That feels better than trying to write the perfect SQL column expression on our own, with all the same configurability as Snowflake's format type options.
As an unrelated note: It does seem worthwhile to refactor file_format
to support dictionary specification, in addition to strings, so that you could write this a little more nicely, e.g.:
external:
file_format:
type: csv
null_if: "('null', '', 'n/a')"
Linking https://github.com/dbt-labs/dbt-external-tables/issues/132 here too.
I think adding null_if
to the end of the create or replace external table ... file format ...
statement would be a good idea to avoid expanding case statements as you pointed out @jtcohen6.
With your suggested pattern / refactor to support dict, I think we should be able to just change these bits:
To:
{% if external.integration -%} integration = '{{external.integration}}' {%- endif %}
file_format = {{external.file_format.type}}
{% if external.file_format.null_if -%} null_if = {{external.file_format.null_if}} {%- endif %}
{% endmacro %}
To consolidate our findings, and my current feelings:
''
(empty string) to be null
by default. That's standard behavior for CSV loading.null
value, we should make it a configurable part of the external table definition.file_format
options. It's listed as an option under CSV type here, though it's only listed under ORC type in the create external table
docs.file_format
type from a user-supplied string (( type = csv )
) to a user-supplied dictionary (file_format: {'type': 'csv'}
) that the macro templates out into the appropriate string format.file_format.null_if
to be the same default values we've proposed ('null'
, ''
). Worth thinking about whether we should preserve the is_null_value
check even so.This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
This is an extension to issue #63.
I would like to propose extending this feature similar to CREATE FILE FORMAT's NULL_IF parameter so that we can treat multiple values to be NULLs instead of just single 'null' in the input.
Proposed Solution:
Extend the external yaml section with a null_if parameter
Update the snowflake__create_external_table macro to read the above setting and modify the case statement to include these values
This would translate the above input null_if values to