OHDSI / SqlRender

This is an R package and Java library for rendering parameterized SQL, and translating it to different SQL dialects.
https://ohdsi.github.io/SqlRender
Other
82 stars 77 forks source link

Formally define OHDSI-SQL #290

Open ablack3 opened 2 years ago

ablack3 commented 2 years ago

SqlRender performs substitution on strings of text but does not guarantee that its output will actually run on the desired dbms. This is because it does not check that it's input is valid OHDSI-SQL.

What if we thought of OHDSI-SQL as a sql dialect and formally defined it's structure so that we could

The definition would look something like

image

reference: https://duckdb.org/docs/sql/statements/select

schuemie commented 2 years ago

This is somewhat related to https://github.com/OHDSI/SqlRender/issues/157, but I like the idea of being able to validate the input. I looked into SQL parsers in the past, but found them hard to implement in the SqlRender syntax (i.e. I couldn't find a good Java library).

ablack3 commented 2 years ago

Yea I think this is really the same issue as #157 but with a focus on a formal definition of OHDSI-SQL syntax that could be checked and parsed programmatically possibly with javacc.

Artem implemented duckdb sql translations but we don't really know if we have implemented all of OHDSI SQL because we don't really know the full domain of inputs to translate.

schuemie commented 2 years ago

Yes, I tend to use this section as an informal definition of the scope of SqlRender.

ablack3 commented 2 years ago

@SofiaMp from Kheiron is interested in taking this on. My description of this task is that we want to formally specify the OHDSI-SQL language since it is a programming language.

I would also suggest we separately specify the inputs to render and translate.

render: {parameterized OHDSI SQL} x {list of name value pairs} -> {non parameterized OHDSI SQL} translate: {non-parameterized OHDSI SQL} -> {sql server, bigquery, impala, ....}

{parameterized OHDSI SQL} can contain branching logic and parameters while {non-parameterized OHDSI SQL} should not.

We need to define the complete syntax and semantics of OHDSI-SQL. Most of the work will be in defining the syntax. The only semantics that need to be defined are the non-SQL elements like branching logic, parameters, and default value declarations which are all done in the SqlRender vignettes. We don't need define the semantics of SQL.

I'll leave it up to @SofiaMp about how to approach this but the result will be a document (a new vignette perhaps) that provides a complete description of the OHDSI-SQL syntax that could be used to implement a parser.

The section @schuemie wrote is a great start https://ohdsi.github.io/SqlRender/articles/UsingSqlRender.html#functions-and-structures-supported-by-translate. @SofiaMp - think about what is missing here for the complete syntax definition? For example what are the function signatures and valid argument values? Does DATEDIFF support the firstdayofweek argument described in the MSSQL documentation?

https://homepage.cs.uri.edu/faculty/hamel/courses/2018/fall2018/csc301/lecture-notes/csc301-ln002.pdf https://en.wikipedia.org/wiki/Syntax_(programming_languages)