OHDSI / Achilles

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) - descriptive statistics about a OMOP CDM database
https://ohdsi.github.io/Achilles/
130 stars 121 forks source link

Case Sensitive Table Names #758

Open Zachary-Higgins opened 7 months ago

Zachary-Higgins commented 7 months ago

We're using Broadsea to connect to a databricks/spark backend. We found some issues where the WebAPI is using queries that reference some tables in all upper case. In some cases they are all lower case.

We can disable case sensitivity on our compute clusters, however given the introduction of SQL Serverless Warehouses, we are trying to take advantage of databricks managed clusters (they are significantly cheaper and we don't need to host the infra). In this scenario, we aren't able to disable case sensitivity like we can in our traditional compute clusters using spark.sql.caseSensitive false

It would be nice if we the Achilles table creation scripts would use the same case sensitivity that the web API is expecting. Or vice versa. I'm not sure where the preferential place to control this would be. I would love to work more closely with someone to provide feedback and explore ways to improve Databricks/Spark compatibility.

Here's an example script that fails when we try to load the dashboard in Atlas

(SELECT
 aa1.analysis_name AS attribute_name,
 ar1.stratum_1 AS attribute_value
 FROM hive_metastore.omop541_pat15k.ACHILLES_analysis aa1
INNER JOIN
hive_metastore.omop541_pat15k.achilles_results ar1
ON aa1.analysis_id = ar1.analysis_id
 WHERE aa1.analysis_id = 0
 UNION
SELECT
 aa1.analysis_name AS attribute_name,
 cast(ar1.count_value AS STRING) AS attribute_value
FROM hive_metastore.omop541_pat15k.ACHILLES_analysis aa1
INNER JOIN
hive_metastore.omop541_pat15k.achilles_results ar1
ON aa1.analysis_id = ar1.analysis_id
WHERE aa1.analysis_id = 1
)
ORDER BY attribute_name DESC

But the Achilles generated SQL scripts creates this table in lower case:

--HINT DISTRIBUTE_ON_KEY(analysis_id) 
CREATE TABLE hive_metastore.omop541_pat15k.achilles_results
USING DELTA
AS
SELECT
analysis_id, stratum_1, stratum_2, stratum_3, stratum_4, stratum_5, count_value
FROM
(
............