OHDSI / SqlRender

This is an R package and Java library for rendering parameterized SQL, and translating it to different SQL dialects.
https://ohdsi.github.io/SqlRender
Other
82 stars 77 forks source link

`SqlRender::translate` replaces # with garbage for rdbms='bigquery' #263

Closed dev-michael-schmidt closed 3 years ago

dev-michael-schmidt commented 3 years ago

Hi,

I'm using OHDSI/ETL-Sythnea which in turn uses DatabaseConnector and SqlRender. When using the DDL hosted here: https://raw.githubusercontent.com/OHDSI/CommonDataModel/v5.3.1/BigQuery/OMOP%20CDM%20bigquery%20ddl.txt

SqlRender replaces the #standardsql directive with garbage when the rdbms is set to 'bigquery'. For example: /*OMOP CDM V5.3.1 14JUNE2018*/\nc6k73saustandardsql\ncreate table ...

The process is easy to reproduce:

webResponse <- httr::GET("https://raw.githubusercontent.com/OHDSI/CommonDataModel/v5.3.1/BigQuery/OMOP%20CDM%20bigquery%20ddl.txt")

tableDDL <- httr::content(webResponse)
tableDDL <- toupper(tableDDL)
print(substr(tableDDL, 0, 50))
tableDDL <- SqlRender::translate(sql = tableDDL, targetDialect = 'bigquery')
print(substr(tableDDL, 0, 50))

Results in:

[1] "/*OMOP CDM V5.3.1 14JUNE2018*/\n#STANDARDSQL\nCREATE"
[1] "/*OMOP CDM V5.3.1 14JUNE2018*/\nc7zradhostandardsql"
schuemie commented 3 years ago

The hash sign indicates a temp table in the source dialect (what we tend to refer to as OHDSISQL, which is basically a subset of SQL Server SQL). And because BigQuery doesn't support temp tables, SqlRender appends a random string to make a unique permanent table name. It is actually also expecting you to specify a tempEmulationSchema so it knows where to create this permanent table emulating a temp table:

sql <- "CREATE TABLE #my_temp (x INT)";
SqlRender::translate(sql, "bigquery", tempEmulationSchema = "scratch")
# [1] "create table scratch.u8jiwwzamy_temp (x INT64)"

I'm unfamiliar with the #STANDARDSQL directive. Why do you need it?

dev-michael-schmidt commented 3 years ago

I'm actually not using the temp table (nor the directive). The OHDSI/ETL-Synthea package is using it via SqlRender by means of the OHDSI/CommonDataModel raw.githubusercontent.com link above. I'm new to BQ, but from my research BQ support 2 "dialects": standard and legacy. As for the specifics, I couldn't say.

A solution might look like: 1) Parse #STANDARDSQL and #LEGACYSQL since this is valid SQL for BQ dialects. 2) Update OHDSI/CommonDataModel's DDL to not include the directive. 3) Ignore these directives, support x dialect, and throw an error in the event of an unsupported dialect.

schuemie commented 3 years ago

But #STANDARDSQL and #LEGACYSQL are not valid statements in the source dialect, only in BigQuery.

I may be missing something, but why are you trying to translate BigQuery SQL to BigQuery SQL?

dev-michael-schmidt commented 3 years ago

I'm not trying to translate anything but instead using OHDSI/ETL-Sythnea which depends on OHDSI/SqlRender. ETL-Synthea supports BigQuery, and when BigQuery is selected, ETL-Synthea fetches this DDL.

This DDL is sent to SqlRender. The DDL contains those directives and those directives are specified here.

konstjar commented 3 years ago

@schuemie we will look into it

konstjar commented 3 years ago

I checked ETL-Synthea code and it looks like it does double dialect translation by taking translated DDL code and running SQLRender::translate. I think it a bug here: https://github.com/OHDSI/ETL-Synthea/blob/master/R/CreateCDMTables.r#L195

schuemie commented 3 years ago

Ah, thanks for digging into this. Would you mind creating an issue in the ETL-Synthea repo?

dev-michael-schmidt commented 3 years ago

Will do. Thanks.