dbt-labs / dbt-adapters

Apache License 2.0
22 stars 33 forks source link

[Feature] Escape `"` within `adapter.quote` #105

Open smilingthax opened 6 months ago

smilingthax commented 6 months ago

Is this a new bug?

Current Behavior

{% do print(adapter.quote('foo"bar')) %}

gives "foo"bar".

Expected Behavior

It should return "foo""bar" (Postgres, ... other DBs might differ?

Steps To Reproduce

E.g. via dbt_project.yaml:

on-run-start: "{{ print(adapter.quote('foo\"bar')) }}"

Relevant log output

No response

Environment

- OS: linux
- dbt-adapter: git HEAD (tested w/: dbt-core: 1.7.6)

Additional Context

No response

dbeatty10 commented 6 months ago

@smilingthax Thank you for opening this issue along with https://github.com/dbt-labs/dbt-adapters/discussions/106 !

Please share more about what you are trying to do in these situations. The more detail the better 🙏

Also: Are you using dbt-postgres, or is it another database adapter that you are using?

It sounds like you are primarily trying to use adapter.quote, but you also tried using these in conjunction:

This looks like a feature request to me, so I'm going to switch the categorization for now. We can switch it back to a bug later as-needed.

smilingthax commented 6 months ago

I'm currently using dbt-postgres (might change in the future).

106 is somewhat related (quote/escape), but deals with values, not table/column/... names.

Generally, I'm working on some macros (for example: to handle migrations to change database-objects not currently tracked/supported by dbt), which should be "safe" to use.
I've come across cases where I insert caller-provided column names (or values) into SQL, but ran into problems when testing with names(or string) containing "dangerous" sql characters (single quotes, double quotes).

Contrary to my expectation from other sql libraries (e.g. node-postgres: pgClient.escapeIdentifier, for values: php PDO::quote()), the implementation in dbt currently requires the user/caller to pre-sanitise input to adapter.quote(...) (or string_literal(...)). And, for quote() I'd even have to roll something akin to escape_double_quotes myself (not too difficult, but then certain database types might need a different treatment?)

IMHO I don't think this is the right choice. The default behaviour should be safe (yes, parameters, or sql, used with dbt do not usually come from totally 'untrusted' users, the possibility of sql-level injections is still quite undesired).
Also, some (most?) current uses of these functions (GitHub search) also do not seem to care about, or aren't even aware of, the possibility that strings they operate on could contain 'unsafe' characters which are not escaped by the current implementation.

dbeatty10 commented 6 months ago

It sounds like you'd like the following (correct me if I'm wrong):

  1. An adapter-specific dbt macro similar to pg.escapeIdentifier that will escape all relevant characters of a Jinja string that represents an unquoted database identifier (table/column/object name).
  2. The adapter-specific implementations of adapter.quote utilize that macro so you can pass in any string and get a quoted identifier that is "safe" to use.

Example

E.g., you'd like to be able to do the following:

models/my_model.sql

{%- set unquoted_identifier = 'foo"bar' -%}
{%- set quoted_identifier = adapter.quote(unquoted_identifier) -%}

select 1 as {{ quoted_identifier }}

And have each of these commands give the following output:

$ dbt compile -s my_model

18:51:42  Compiled node 'my_model' is:
select 1 as "foo""bar"
$ dbt show -s my_model

18:50:34  Previewing node 'my_model':
| foo"bar |
| ------- |
|       1 |

Additional context

Long term, we're interested in revisiting our implementations related quoting and the user interfaces (https://github.com/dbt-labs/dbt-core/issues/2986).

In the meantime, we've got a lot of different quoting-related things we'd like to document more thoroughly (https://github.com/dbt-labs/docs.getdbt.com/issues/3518)

smilingthax commented 6 months ago
  1. An adapter-specific dbt macro similar to pg.escapeIdentifier that will escape all relevant characters of a Jinja string that represents an unquoted database identifier (table/column/object name).

Well, despite the namepg.escapeIdentifier('foo"bar') -> "foo""bar" also adds the outer quotes (i.e. 2.). A separate function is seldom provided/needed (well, sqlite_mprintf / format() %w comes to mind, but for values also provides %Q, which does include the outer quotes [and handles null specially]...).

And wrt. to #106, I think escape_single_quote(...) could have its justification, but it's strictly speaking not the right thing use with string_literal(...), because whether string_literal(...) actually uses single quotes is an implementation detail.

  1. The adapter-specific implementations of adapter.quote utilize that macro so you can pass in any string and get a quoted identifier that is "safe" to use.

Yes. I even doubt there a sensible use-case for the current behaviour which is not a (hidden) bug (waiting for it's time to show off).