Closed lcpinheiro closed 2 years ago
Could you provide a bit more of the SQL statement (where you now say 'Truncated', so we can find the specific original SQL statement that has the translation issue?
DBMS: spark
Error: java.sql.SQLException: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
== SQL ==
Thanks! This appears to be a SqlRender issue: https://github.com/OHDSI/SqlRender/issues/307
I think I now fixed it. Unfortunately, I don't have a Spark instance to test against. Could you give it a try? You'll need to install the develop version of SqlRender:
remotes::install_github("ohdsi/SqlRender", ref="develop")
Ran it again and did not have the previous error, however it does throw an error at the end of the analysis:
DBMS: spark
Error: java.sql.SQLException: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: drop index(line 1, pos 0)
== SQL == drop index hive_metastore.temp_db.idx_ar_aid ^^^
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:47)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:435)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:257)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:123)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:48)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:52)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:235)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:220)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:269)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: drop index(line 1, pos 0)
== SQL == drop index hive_metastore.temp_db.idx_ar_aid ^^^
at org.apache.spark.sql.errors.QueryParsingErrors$.operationNotAllowedError(QueryParsingErrors.scala:276)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:50)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitFailNativeCommand$1(SparkSqlParser.scala:509)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:157)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:498)
at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitFailNativeCommand(SparkSqlParser.scala:95)
at org.apache.spark.sql.catalyst.parser.SqlBaseParser$FailNativeCommandContext.accept(SqlBaseParser.java:1048)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:87)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:157)
at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:87)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$2(ParseDriver.scala:109)
at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:157)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:109)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:142)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:88)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:106)
at com.databricks.sql.parser.DatabricksSqlParser.$anonfun$parsePlan$1(DatabricksSqlParser.scala:77)
at com.databricks.sql.parser.DatabricksSqlParser.parse(DatabricksSqlParser.scala:97)
at com.databricks.sql.parser.DatabricksSqlParser.parsePlan(DatabricksSqlParser.scala:74)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$3(SparkExecuteStatementOperation.scala:337)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:151)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$2(SparkExecuteStatementOperation.scala:337)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$compileQuery$1(SparkExecuteStatementOperation.scala:334)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:327)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.compileQuery(SparkExecuteStatementOperation.scala:334)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:390)
... 16 more
, Query: drop index hive_metastore.temp_db.idx_ar_aid.
SQL: drop index hive_metastore.temp_db.idx_ar_aid
R version: R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32
Attached base packages:
Other attached packages:
The console error is:
Error in writeChar(report, fileConn, eos = NULL) : invalid 'nchars' argument Error in rJava::.jcall(metaData, "Ljava/sql/ResultSet;", "getCatalogs") : java.sql.SQLException: [Simba]SparkJDBCDriver Error getting catalog information: Metadata Initialization Error.
Well, I think that is progress?
The new error message seems to be about being able to drop an index:
Operation not allowed: drop index(line 1, pos 0)
It appears Achilles is trying to drop indices that it is about to create as a precaution. I don't think Spark even supports indices?
You could probably avoid all of this by using createIndices = FALSE
when calling achilles()
. Could you give that a try?
Indeed, it is!
I'll run it again. However, as I used the dropAllScractchTables
function, I noticed that it is not removing them but there's also no error generated, only warnings. 31 of these:
Temp table name '#s_f9stap8cs_dist_1816' is too long. Temp table names should be shorter than 22 characters to prevent Oracle from crashing.
I'll write some code to remove them and then run Achilles again. Thanks!
Ran it again. There was no error message for the tables creation, and I think I have all of them. The last line of the log file was: Done. Achilles results can now be found in schema hive_metastore.temp_db.
There is one error in the console: Error in rJava::.jcall(metaData, "Ljava/sql/ResultSet;", "getCatalogs") : java.sql.SQLException: [Simba]SparkJDBCDriver Error getting catalog information: Metadata Initialization Error.
I'm guessing this is the server kicking me out (?) and it doesn't seem to affect the results.
Indeed, the error below occurs after most jobs. E.g. exportResultsToCSV()
also triggers it, but the results are exported.
Error in rJava::.jcall(metaData, "Ljava/sql/ResultSet;", "getCatalogs") : java.sql.SQLException: [Simba]SparkJDBCDriver Error getting catalog information: Metadata Initialization Error.
The error message mentions getCatalogs()
, which is currently only called to populate the Connections tab in RStudio. I'm guessing this tab doesn't work in Spark right now? As soon as I have a Spark testing server I can debug. It should not interfere with Achilles.
@schuemie, it did not interfere with Achilles in the end. However, trying to use AchillesWeb with the resulting data throws an error similar to issue #277 that was opened in 2018. Would you know if there was a solution, seems closed because AchillesWeb isn't to be used frequently.
I'm not aware of a solution. The error message in #277 was very specific to Impala. Could you provide the exact error messages you're seeing for Spark?
@schuemie see below, thanks!
DBMS: spark
Error: java.sql.SQLException: [Simba]SparkJDBCDriver Error during translation: code HYC00, type Driver not capable, message Impala only supports left associative joins
SQL: select c1.concept_id as CONCEPT_ID, c1.concept_name as CONCEPT_NAME, cast(ROUND(CAST((100.0*num.count_value / denom.count_value) AS float),0) as bigint) as Y_NUM_PERSONS, num.stratum_2 as X_COUNT from (select count_value from hive_metastore.temp_db.achilles_results where analysis_id = 1) denom, (select CAST(stratum_1 as bigint) stratum_1, CAST(stratum_2 as bigint) stratum_2, count_value from hive_metastore.temp_db.achilles_results where analysis_id = 791) num inner join hive_metastore.imrd_uk_202205.concept c1 on num.stratum_1 = c1.concept_id order by num.stratum_2
R version: R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32
Attached base packages:
Other attached packages:
Thanks! This really doesn't make any sense: the Spark driver is throwing an Impala error, one that returns 0 hits on Google. The SQL also looks fairly straightforward.
I really need access to a Spark database to debug this. I've requested a testing server, but this may take some time.
@schuemie thanks! I'm going to close the issue because the main concern was addressed with the changes you pushed and I tested successfully.
@lcpinheiro : I've been provided with a Spark testing server (many thanks to @leeevans !). We're unable to reproduce the error there.
Could you help us understand your setup? Are you running Spark on Impala?
I ran Achilles on a local instance of R/RStudio connecting to a Spark SQL database in Azure Databricks. There were 22 errors, all of which seem to be related to the SqlRender translation.
Illustrative error log (truncated):
----Truncated----- Error: java.sql.SQLException: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
== SQL ==