OHDSI / Achilles

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) - descriptive statistics about a OMOP CDM database
https://ohdsi.github.io/Achilles/
130 stars 121 forks source link

Error translating to Spark SQL #686

Closed lcpinheiro closed 2 years ago

lcpinheiro commented 2 years ago

I ran Achilles on a local instance of R/RStudio connecting to a Spark SQL database in Azure Databricks. There were 22 errors, all of which seem to be related to the SqlRender translation.

Illustrative error log (truncated):

----Truncated----- Error: java.sql.SQLException: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

op_outside AS ( ^^^ SELECT COUNT(DISTINCT m.person_id) AS person_count FROM _----Truncated-----_ The sql analyses generating an error log were 231, 232, 431, 432, 531, 532, 631, 632, 731, 732, 831, 832, 932, 932, 1031, 1032, 1331, 1332, 1831, 1832, 2131, 2132.
schuemie commented 2 years ago

Could you provide a bit more of the SQL statement (where you now say 'Truncated', so we can find the specific original SQL statement that has the translation issue?

lcpinheiro commented 2 years ago

DBMS: spark

Error: java.sql.SQLException: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

op_outside AS ( ^^^ SELECT COUNT(DISTINCT vo.person_id) AS person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence vo LEFT JOIN hive_metastore.emis_imrd_202205.observation_period op ON vo.person_id = op.person_id AND vo.visit_start_date >= op.observation_period_start_date AND vo.visit_start_date <= op.observation_period_end_date WHERE op.person_id IS NULL ), vo_total AS ( SELECT COUNT(DISTINCT person_id) person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence ) CREATE TABLE hive_metastore.temp_db.wnu13lu3s_tmpach_231 USING DELTA AS (SELECT 231 AS analysis_id, CASE WHEN vo.person_count != 0 THEN CAST(CAST(1.0*op.person_count/vo.person_count AS FLOAT) AS STRING) ELSE CAST(NULL AS STRING) END AS stratum_1, CAST(op.person_count AS STRING) AS stratum_2, CAST(vo.person_count AS STRING) AS stratum_3, CAST(NULL AS STRING) AS stratum_4, CAST(NULL AS STRING) AS stratum_5, SIGN(op.person_count) AS count_value FROM op_outside op CROSS JOIN vo_total vo) at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:47) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:435) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:257) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:123) at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:48) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:52) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:235) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:220) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:269) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0) == SQL == op_outside AS ( ^^^ SELECT COUNT(DISTINCT vo.person_id) AS person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence vo LEFT JOIN hive_metastore.emis_imrd_202205.observation_period op ON vo.person_id = op.person_id AND vo.visit_start_date >= op.observation_period_start_date AND vo.visit_start_date <= op.observation_period_end_date WHERE op.person_id IS NULL ), vo_total AS ( SELECT COUNT(DISTINCT person_id) person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence ) CREATE TABLE hive_metastore.temp_db.wnu13lu3s_tmpach_231 USING DELTA AS (SELECT 231 AS analysis_id, CASE WHEN vo.person_count != 0 THEN CAST(CAST(1.0*op.person_count/vo.person_count AS FLOAT) AS STRING) ELSE CAST(NULL AS STRING) END AS stratum_1, CAST(op.person_count AS STRING) AS stratum_2, CAST(vo.person_count AS STRING) AS stratum_3, CAST(NULL AS STRING) AS stratum_4, CAST(NULL AS STRING) AS stratum_5, SIGN(op.person_count) AS count_value FROM op_outside op CROSS JOIN vo_total vo) at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:298) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:159) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:88) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:106) at com.databricks.sql.parser.DatabricksSqlParser.$anonfun$parsePlan$1(DatabricksSqlParser.scala:77) at com.databricks.sql.parser.DatabricksSqlParser.parse(DatabricksSqlParser.scala:97) at com.databricks.sql.parser.DatabricksSqlParser.parsePlan(DatabricksSqlParser.scala:74) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$3(SparkExecuteStatementOperation.scala:337) at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:151) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$2(SparkExecuteStatementOperation.scala:337) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$1(SparkExecuteStatementOperation.scala:334) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:327) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.compileQuery(SparkExecuteStatementOperation.scala:334) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:390) ... 16 more , Query: op_outside AS ( SELECT COUNT(DISTINCT vo.person_id) AS person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence vo LEFT JOIN hive_metastore.emis_imrd_202205.observation_period op ON vo.person_id = op.person_id AND vo.visit_start_date >= op.observation_period_start_date AND vo.visit_start_date <= op.observation_period_end_date WHERE op.person_id IS NULL ), vo_total AS ( SELECT COUNT(DISTINCT person_id) person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence ) CREATE TABLE hive_metastore.temp_db.wnu13lu3s_tmpach_231 USING DELTA AS (SELECT 231 AS analysis_id, CASE WHEN vo.person_count != 0 THEN CAST(CAST(1.0*op.person_count/vo.person_count AS FLOAT) AS STRING) ELSE CAST(NULL AS STRING) END AS stratum_1, CAST(op.person_count AS STRING) AS stratum_2, CAST(vo.person_count AS STRING) AS stratum_3, CAST(NULL AS STRING) AS stratum_4, CAST(NULL AS STRING) AS stratum_5, SIGN(op.person_count) AS count_value FROM op_outside op CROSS JOIN vo_total vo). SQL: op_outside AS ( SELECT COUNT(DISTINCT vo.person_id) AS person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence vo LEFT JOIN hive_metastore.emis_imrd_202205.observation_period op ON vo.person_id = op.person_id AND vo.visit_start_date >= op.observation_period_start_date AND vo.visit_start_date <= op.observation_period_end_date WHERE op.person_id IS NULL ), vo_total AS ( SELECT COUNT(DISTINCT person_id) person_count FROM hive_metastore.emis_imrd_202205.visit_occurrence ) CREATE TABLE hive_metastore.temp_db.wnu13lu3s_tmpach_231 USING DELTA AS (SELECT 231 AS analysis_id, CASE WHEN vo.person_count != 0 THEN CAST(CAST(1.0*op.person_count/vo.person_count AS FLOAT) AS STRING) ELSE CAST(NULL AS STRING) END AS stratum_1, CAST(op.person_count AS STRING) AS stratum_2, CAST(vo.person_count AS STRING) AS stratum_3, CAST(NULL AS STRING) AS stratum_4, CAST(NULL AS STRING) AS stratum_5, SIGN(op.person_count) AS count_value FROM op_outside op CROSS JOIN vo_total vo) R version: R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32 Attached base packages: - stats - graphics - grDevices - utils - datasets - methods - base Other attached packages: - CohortMethod (4.2.2) - FeatureExtraction (3.2.0) - Andromeda (0.6.1) - Cyclops (3.1.2) - SqlRender (1.10.0) - forcats (0.5.1) - stringr (1.4.0) - dplyr (1.0.9) - purrr (0.3.4) - readr (2.1.2) - tidyr (1.2.0) - tibble (3.1.7) - ggplot2 (3.3.6) - tidyverse (1.3.1) - Achilles (1.7) - DatabaseConnector (5.0.3)
schuemie commented 2 years ago

Thanks! This appears to be a SqlRender issue: https://github.com/OHDSI/SqlRender/issues/307

schuemie commented 2 years ago

I think I now fixed it. Unfortunately, I don't have a Spark instance to test against. Could you give it a try? You'll need to install the develop version of SqlRender:

remotes::install_github("ohdsi/SqlRender", ref="develop")
lcpinheiro commented 2 years ago

Ran it again and did not have the previous error, however it does throw an error at the end of the analysis:

DBMS: spark

Error: java.sql.SQLException: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: drop index(line 1, pos 0)

== SQL == drop index hive_metastore.temp_db.idx_ar_aid ^^^

at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:47)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:435)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:257)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:123)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:48)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:52)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:235)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:220)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:269)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: drop index(line 1, pos 0)

== SQL == drop index hive_metastore.temp_db.idx_ar_aid ^^^

at org.apache.spark.sql.errors.QueryParsingErrors$.operationNotAllowedError(QueryParsingErrors.scala:276)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:50)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitFailNativeCommand$1(SparkSqlParser.scala:509)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:157)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:498)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:95)
at org.apache.spark.sql.catalyst.parser.SqlBaseParser$FailNativeCommandContext.accept(SqlBaseParser.java:1048)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:87)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:157)
at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:87)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$2(ParseDriver.scala:109)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:157)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:109)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:142)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:88)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:106)
at com.databricks.sql.parser.DatabricksSqlParser.$anonfun$parsePlan$1(DatabricksSqlParser.scala:77)
at com.databricks.sql.parser.DatabricksSqlParser.parse(DatabricksSqlParser.scala:97)
at com.databricks.sql.parser.DatabricksSqlParser.parsePlan(DatabricksSqlParser.scala:74)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$3(SparkExecuteStatementOperation.scala:337)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:151)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$2(SparkExecuteStatementOperation.scala:337)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$1(SparkExecuteStatementOperation.scala:334)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:327)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.compileQuery(SparkExecuteStatementOperation.scala:334)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:390)
... 16 more

, Query: drop index hive_metastore.temp_db.idx_ar_aid.

SQL: drop index hive_metastore.temp_db.idx_ar_aid

R version: R version 4.2.1 (2022-06-23 ucrt)

Platform: x86_64-w64-mingw32

Attached base packages:

Other attached packages:

lcpinheiro commented 2 years ago

The console error is:

Error in writeChar(report, fileConn, eos = NULL) : invalid 'nchars' argument Error in rJava::.jcall(metaData, "Ljava/sql/ResultSet;", "getCatalogs") : java.sql.SQLException: [Simba]SparkJDBCDriver Error getting catalog information: Metadata Initialization Error.

schuemie commented 2 years ago

Well, I think that is progress?

The new error message seems to be about being able to drop an index:

Operation not allowed: drop index(line 1, pos 0)

It appears Achilles is trying to drop indices that it is about to create as a precaution. I don't think Spark even supports indices? You could probably avoid all of this by using createIndices = FALSE when calling achilles(). Could you give that a try?

lcpinheiro commented 2 years ago

Indeed, it is!

I'll run it again. However, as I used the dropAllScractchTables function, I noticed that it is not removing them but there's also no error generated, only warnings. 31 of these:

Temp table name '#s_f9stap8cs_dist_1816' is too long. Temp table names should be shorter than 22 characters to prevent Oracle from crashing.

I'll write some code to remove them and then run Achilles again. Thanks!

lcpinheiro commented 2 years ago

Ran it again. There was no error message for the tables creation, and I think I have all of them. The last line of the log file was: Done. Achilles results can now be found in schema hive_metastore.temp_db.

There is one error in the console: Error in rJava::.jcall(metaData, "Ljava/sql/ResultSet;", "getCatalogs") : java.sql.SQLException: [Simba]SparkJDBCDriver Error getting catalog information: Metadata Initialization Error.

I'm guessing this is the server kicking me out (?) and it doesn't seem to affect the results.

lcpinheiro commented 2 years ago

Indeed, the error below occurs after most jobs. E.g. exportResultsToCSV() also triggers it, but the results are exported.

Error in rJava::.jcall(metaData, "Ljava/sql/ResultSet;", "getCatalogs") : java.sql.SQLException: [Simba]SparkJDBCDriver Error getting catalog information: Metadata Initialization Error.

schuemie commented 2 years ago

The error message mentions getCatalogs(), which is currently only called to populate the Connections tab in RStudio. I'm guessing this tab doesn't work in Spark right now? As soon as I have a Spark testing server I can debug. It should not interfere with Achilles.

lcpinheiro commented 2 years ago

@schuemie, it did not interfere with Achilles in the end. However, trying to use AchillesWeb with the resulting data throws an error similar to issue #277 that was opened in 2018. Would you know if there was a solution, seems closed because AchillesWeb isn't to be used frequently.

schuemie commented 2 years ago

I'm not aware of a solution. The error message in #277 was very specific to Impala. Could you provide the exact error messages you're seeing for Spark?

lcpinheiro commented 2 years ago

@schuemie see below, thanks!

DBMS: spark

Error: java.sql.SQLException: [Simba]SparkJDBCDriver Error during translation: code HYC00, type Driver not capable, message Impala only supports left associative joins

SQL: select c1.concept_id as CONCEPT_ID, c1.concept_name as CONCEPT_NAME, cast(ROUND(CAST((100.0*num.count_value / denom.count_value) AS float),0) as bigint) as Y_NUM_PERSONS, num.stratum_2 as X_COUNT from (select count_value from hive_metastore.temp_db.achilles_results where analysis_id = 1) denom, (select CAST(stratum_1 as bigint) stratum_1, CAST(stratum_2 as bigint) stratum_2, count_value from hive_metastore.temp_db.achilles_results where analysis_id = 791) num inner join hive_metastore.imrd_uk_202205.concept c1 on num.stratum_1 = c1.concept_id order by num.stratum_2

R version: R version 4.2.1 (2022-06-23 ucrt)

Platform: x86_64-w64-mingw32

Attached base packages:

Other attached packages:

schuemie commented 2 years ago

Thanks! This really doesn't make any sense: the Spark driver is throwing an Impala error, one that returns 0 hits on Google. The SQL also looks fairly straightforward.

I really need access to a Spark database to debug this. I've requested a testing server, but this may take some time.

lcpinheiro commented 2 years ago

@schuemie thanks! I'm going to close the issue because the main concern was addressed with the changes you pushed and I tested successfully.

schuemie commented 2 years ago

@lcpinheiro : I've been provided with a Spark testing server (many thanks to @leeevans !). We're unable to reproduce the error there.

Could you help us understand your setup? Are you running Spark on Impala?