dataform-co / dataform

Dataform is a framework for managing SQL based data operations in BigQuery
https://cloud.google.com/dataform/docs
Apache License 2.0
837 stars 160 forks source link

dataform cli format issue when columns contain "--" #1740

Open kevinzhou-izivia opened 4 months ago

kevinzhou-izivia commented 4 months ago
config {
  type: "table",
}

SELECT
  source_data_json.timestamp,
  source_data_json.`TEMPERATURE--BODY`,
  source_data_json.`ENERGY_ACTIVE_IMPORT_REGISTER--BODY`,

   FROM
  `XXX.YYY.ZZZ`

is formatted as

config {
  type: "table",
}

SELECT
  source_data_json.timestamp,
  source_data_json.`TEMPERATURE
  --BODY`,
  source_data_json.` ENERGY_ACTIVE_IMPORT_REGISTER
  --BODY`,
FROM
  `XXX.YYY.ZZZ`

expected:

config {
  type: "table",
}

SELECT
  source_data_json.timestamp,
  source_data_json.`TEMPERATURE--BODY`,
  source_data_json.`ENERGY_ACTIVE_IMPORT_REGISTER--BODY`,
FROM
  `XXX.YYY.ZZZ`

dataform --version returns 2.9.0

Ekrekr commented 4 months ago

@Ekrekr test that this is still reproducible after https://github.com/dataform-co/dataform/pull/1741 is merged.

Ekrekr commented 1 month ago

Issue is not fixed. I think this is caused by the lexing, where we still treat inner SQL literal strings as comments even when we shouldn't be https://github.com/dataform-co/dataform/blob/2531b120c3869dbbc6efb4ecafbcfa9edb99c738/sqlx/lexer.ts#L374

kevin-zhou-dev commented 2 weeks ago

Another maybe linked error ? Executed with dataform cli version : 3.0.2

Example: definitions/test.sqlx

config { type: "view"} 
WITH int_table AS (
SELECT id
FROM `my_dataset.my_table`) -- test
SELECT id
FROM int_table

What we get When running dataform format --actions="definitions/test.sqlx", an error is returned : Errors encountered during formatting: definitions/test.sqlx: Formatter unable to determine final formatted form.

While we expect:

config {
  type: "view"
}

WITH
  int_table AS (
  SELECT
    id
  FROM
    `my_dataset.my_table`) -- test
SELECT
  id
FROM
  int_table