databrickslabs / remorph

Cross-compiler and Data Reconciler into Databricks Lakehouse
Other
37 stars 23 forks source link

[FEATURE]: Improve documentation for from_json usage #783

Open ajalisatgi opened 2 months ago

ajalisatgi commented 2 months ago

Is there an existing issue for this?

Category of feature request

Transpile

Problem statement

Remorph translates Snowflake SQL queries with try_parse_json to DatabricksSQL equivalent that has the from_json function. The from_json function expects a schema to be provided. The Remorph translation adds a placeholder for the schema to be provided by the user.

Sample Snowflake SQL query: SELECT col_a, TRY_PARSE_JSON(col_a):field_1 AS field_1 FROM schema_a.table_a

Remorph translation to Databricks SQL: /* The key change between the original and translated SQL queries is the method used to parse JSON data from a column. The original query usesTRY_PARSE_JSON(col_a):field_1to parse and extract thefield_1value from the JSON object incol_a. The translated query replaces this withFROM_JSON(col_a, {COL_A_SCHEMA}).field_1, introducing a schema definition{COL_A_SCHEMA}for parsing the JSON data incol_aand extracting thefield_1` value. This adjustment suggests a more structured approach to JSON parsing by explicitly defining the expected schema. */

SELECT col_a, FROM_JSON(col_a, {COL_A_SCHEMA}).field_1 AS field_1 FROM schema_a.table_a`

Proposed Solution

Improve the SQL translation comment by providing ways to determine the JSON schema on Databricks using either the schema_of_json or schema_of_json_agg function. Documentation links will also be helpful.

Additional Context

No response