Closed lcpinheiro closed 1 year ago
If I let the native query through this is the error I get:
Error: nanodbc/nanodbc.cpp:1655: 42000: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near '?'(line 2, pos 8)
== SQL ==
INSERT INTO temp_db.mltcuzqqsbip_test_table (`chr_col`, `numeric_col`)
VALUES (?, ?)
--------^^^
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:48)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:498)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStat
Thank you for reporting this @lcpinheiro. I apologize for the delay. I'll do some digging. Feel free to tag me on any database related Darwin issues you need attention on. This is the first issue on the this repo and I realized I was not getting notifications for it!
Hi @lcpinheiro,
I found an issue with comments in the SQL code generated by the OHDSI tools for cohort generation. I've made some changes to CDMConnector to try and fix this and have a working reprex for you to try.
Please install the new branch I created for this issue remotes::install_github("OdyOSG/CDMConnector", ref = "issue_122")
(I had to change the issue number to match the private repo) and try running the example below.
# install the branch for this issue
# remotes::install_github("OdyOSG/CDMConnector", ref = "issue_122")
library(CDMConnector)
con <- DBI::dbConnect(odbc::odbc(), dsn = "Databricks")
writeSchema <- "omop531results"
cdmSchema <- "omop531"
cdm <- cdmFromCon(con,
cdmSchema = cdmSchema,
cdmTables = tbl_group("default"),
writeSchema = writeSchema)
cohortSet <- readCohortSet(system.file("cohorts2", package = "CDMConnector", mustWork = TRUE))[3,]
cdm <- generateCohortSet(cdm,
cohortSet,
name = "chrt0",
computeAttrition = TRUE,
overwrite = TRUE)
cdm$chrt0
#> # Source: table<`omop531results`.`chrt0`> [0 x 4]
#> # Database: Spark SQL 3.2.1[token@Spark SQL/hive_metastore]
#> # … with 4 variables: cohort_definition_id <int>, subject_id <int64>,
#> # cohort_start_date <date>, cohort_end_date <date>
cohort_set(cdm$chrt0)
#> # Source: table<`omop531results`.`chrt0_set`> [1 x 2]
#> # Database: Spark SQL 3.2.1[token@Spark SQL/hive_metastore]
#> cohort_definition_id cohort_name
#> <int> <chr>
#> 1 3 GIBleed_male
DBI::dbDisconnect(con)
Created on 2023-03-24 with reprex v2.0.2
You can see the changes I made here if you like: https://github.com/OdyOSG/CDMConnector/pull/16/files
I'm removing comments from the cohort generation sql.
@lcpinheiro Any update?
Hey @ablack3, Thanks! I'll be looking at it today. Let you know shortly!
Hi @ablack3,
Just an update, I still have the same error. There seems to be an issue with this bit of code in the cdmFromCon()
function.
df1 <- data.frame(chr_col = "a", numeric_col = 1)
# ROracle does not support integer round trip
DBI::dbWriteTable(con, DBI::SQL(tablename), df1)
withr::with_options(list(databaseConnectorIntegerAsNumeric = FALSE), {
df2 <- DBI::dbReadTable(con, DBI::SQL(tablename))
names(df2) <- tolower(names(df2))
})
It could be a setting from our side. Will get back to this next week.
Ok let me check again.
Hey @ablack3,
Your solution seems to work. The different issue that I raised (https://github.com/darwin-eu/CDMConnector/issues/3#issuecomment-1492182896) is related to choosing "Use native query" in the ODBC settings. Once removed the issue goes away. Databricks seems to have a bit of a problem with using native query, with JDBC it's required, with ODBC it seems detrimental.
I did have an issue with the cdmFromCon()
and cdm_from_con()
functions. One doesn't seem to work.
> cdm <- cdm_from_con(con,
+ cdmSchema = cdmSchema,
+ cdmTables = tbl_group("default"),
+ writeSchema = writeSchema)
Error in cdm_from_con(con, cdmSchema = cdmSchema, cdmTables = tbl_group("default"), :
unused arguments (cdmSchema = cdmSchema, cdmTables = tbl_group("default"), writeSchema = writeSchema)
> cdm <- cdmFromCon(con,
+ cdmSchema = cdmSchema,
+ cdmTables = tbl_group("default"),
+ writeSchema = writeSchema)
>
Thanks for the info. what is a "native query"?
For cdm_from_con
the argument names also need to be snake case.
cdm <- cdm_from_con(con,
cdm_schema = cdmSchema,
cdm_tables = tbl_group("default"),
write_schema = writeSchema)
@lcpinheiro Can we close this or are you still having a problem?
Closing this and assuming the issue is fixed. Reopen if needed.
The
generateCohortSet()
function, calling a Spark DB on Databricks, through an ODBC connection threw the error below:I have disabled the "Enable translation for CTAS" option on the ODBC Data Source Administration, but this didn't sort the issue.
Any suggestions?