dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
838 stars 160 forks source link

Bug: CLI vars are not properly split when containing commas #1737

Closed federicojasson closed 1 month ago

federicojasson commented 4 months ago

Variables passed using --vars are not properly split when containing commas.

Reproduce

Run daform compile --vars MY_VAR_1="var1,FAKE_VAR=test",MY_VAR_2="var2"

Expected dataform.projectConfig.vars value:

{ MY_VAR_1: 'var1,FAKE_VAR=test', MY_VAR_2: 'var2' }

Actual dataform.projectConfig.vars value:

{ MY_VAR_1: 'var1', FAKE_VAR: 'test', MY_VAR_2: 'var2' }

Version: 3.0.0-beta.4 UPDATE: still happening on version 3.0.0.

Related code: https://github.com/dataform-co/dataform/blob/f23795ab4424ac592a8ced9b49d11fa2790882ca/cli/index.ts#L690-L696

Ekrekr commented 1 month ago

This is actually a unix thing, you can test it by running:

$ echo MY_VAR_1="var1,FAKE_VAR=test",MY_VAR_2="var2"
MY_VAR_1=var1,FAKE_VAR=test,MY_VAR_2=var2

So the quotes are actually gone before they reach the CLI.

If you put backslashes in before the quotes, you should get the desired functionality though:

$ echo MY_VAR_1=\"var1,FAKE_VAR=test\",MY_VAR_2=\"var2\"
MY_VAR_1="var1,FAKE_VAR=test",MY_VAR_2="var2"
federicojasson commented 1 month ago

@Ekrekr Have you tested that workaround? I'm not seeing the expected behaviour either:

dataform compile --default-database "some-db" --default-location "US" --default-schema "some-schema" --vars MY_VAR_1=\"var1,FAKE_VAR=test\",MY_VAR_2=\"var2\" --json

Results in:

{
    "projectConfig": {
        "warehouse": "bigquery",
        "defaultSchema": "some-schema",
        "defaultDatabase": "some-db",
        "vars": {
            "MY_VAR_1": "\"var1",
            "FAKE_VAR": "test\"",
            "MY_VAR_2": "\"var2\""
        },
        "defaultLocation": "US"
    },
    "graphErrors": {},
    "dataformCoreVersion": "3.0.0"
}

What you said about quotes not reaching the CLI is correct. However, the CLI code itself is not prepared to handle quotes properly. The example shows two things:

Ekrekr commented 1 month ago

Still, closer! Would love if you could submit a patch for this.

mrmeszaros commented 3 weeks ago

Hey, I have a similar issue, where one of my vars is a JSON string.

datafrom run --vars='some_ids=[1,2,3],other_var=other_value'

What I would expect in the dataform is:

session.projectConfig.vars.some_ids == '[1,2,3]'
session.projectConfig.vars.other_var == 'other_value'

So that I can then JSON.parse them on the receiving end before using inside of an SQLX.

@Ekrekr Could You explain a bit more how can I pass a string containing a comma into dataform?

PS.: Based on @federicojasson's link to the code, it seems to me that the processing code splits on EVERY comma, no matter how it got passed from the shell.