Remorph translates Snowflake SQL queries with try_parse_json to DatabricksSQL equivalent that has the from_json function. The from_json function expects a schema to be provided. The Remorph translation adds a placeholder for the schema to be provided by the user.
Sample Snowflake SQL query:
SELECT col_a, TRY_PARSE_JSON(col_a):field_1 AS field_1 FROM schema_a.table_a
Remorph translation to Databricks SQL:
/* The key change between the original and translated SQL queries is the method used to parse JSON data from a column. The original query usesTRY_PARSE_JSON(col_a):field_1to parse and extract thefield_1value from the JSON object incol_a. The translated query replaces this withFROM_JSON(col_a, {COL_A_SCHEMA}).field_1, introducing a schema definition{COL_A_SCHEMA}for parsing the JSON data incol_aand extracting thefield_1` value. This adjustment suggests a more structured approach to JSON parsing by explicitly defining the expected schema.
*/
SELECT
col_a,
FROM_JSON(col_a, {COL_A_SCHEMA}).field_1 AS field_1
FROM schema_a.table_a`
Proposed Solution
Improve the SQL translation comment by providing ways to determine the JSON schema on Databricks using either the schema_of_json or schema_of_json_agg function. Documentation links will also be helpful.
Is there an existing issue for this?
Category of feature request
Transpile
Problem statement
Remorph translates Snowflake SQL queries with
try_parse_json
to DatabricksSQL equivalent that has thefrom_json
function. Thefrom_json
function expects a schema to be provided. The Remorph translation adds a placeholder for the schema to be provided by the user.Sample Snowflake SQL query:
SELECT col_a, TRY_PARSE_JSON(col_a):field_1 AS field_1 FROM schema_a.table_a
Remorph translation to Databricks SQL:
/* The key change between the original and translated SQL queries is the method used to parse JSON data from a column. The original query uses
TRY_PARSE_JSON(col_a):field_1to parse and extract the
field_1value from the JSON object in
col_a. The translated query replaces this with
FROM_JSON(col_a, {COL_A_SCHEMA}).field_1, introducing a schema definition
{COL_A_SCHEMA}for parsing the JSON data in
col_aand extracting the
field_1` value. This adjustment suggests a more structured approach to JSON parsing by explicitly defining the expected schema. */SELECT col_a, FROM_JSON(col_a, {COL_A_SCHEMA}).field_1 AS field_1 FROM schema_a.table_a`
Proposed Solution
Improve the SQL translation comment by providing ways to determine the JSON schema on Databricks using either the
schema_of_json
orschema_of_json_agg
function. Documentation links will also be helpful.Additional Context
No response