darwin-eu / CDMConnector

A pipe friendly way to interact with an OMOP Common Data Model
https://darwin-eu.github.io/CDMConnector/
Apache License 2.0
12 stars 10 forks source link

Databricks database access issue #3

Closed lcpinheiro closed 1 year ago

lcpinheiro commented 1 year ago

The generateCohortSet() function, calling a Spark DB on Databricks, through an ODBC connection threw the error below:

cdm <- generateCohortSet(cdm, ma_cohort_set, name = "ma", computeAttrition = TRUE, 
    overwrite = TRUE)
Error in connection_begin(conn@ptr) : 
  nanodbc/nanodbc.cpp:1199: HYC00: [Simba][ODBC] (11470) Transactions are not supported. 

I have disabled the "Enable translation for CTAS" option on the ODBC Data Source Administration, but this didn't sort the issue.

Any suggestions?

lcpinheiro commented 1 year ago

If I let the native query through this is the error I get:

Error: nanodbc/nanodbc.cpp:1655: 42000: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: [PARSE_SYNTAX_ERROR] org.apache.spark.sql.catalyst.parser.ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near '?'(line 2, pos 8)

== SQL ==
INSERT INTO temp_db.mltcuzqqsbip_test_table (`chr_col`, `numeric_col`)
VALUES (?, ?)
--------^^^

    at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:48)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:498)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStat
ablack3 commented 1 year ago

Thank you for reporting this @lcpinheiro. I apologize for the delay. I'll do some digging. Feel free to tag me on any database related Darwin issues you need attention on. This is the first issue on the this repo and I realized I was not getting notifications for it!

ablack3 commented 1 year ago

Hi @lcpinheiro,

I found an issue with comments in the SQL code generated by the OHDSI tools for cohort generation. I've made some changes to CDMConnector to try and fix this and have a working reprex for you to try.

Please install the new branch I created for this issue remotes::install_github("OdyOSG/CDMConnector", ref = "issue_122") (I had to change the issue number to match the private repo) and try running the example below.

# install the branch for this issue
# remotes::install_github("OdyOSG/CDMConnector", ref = "issue_122")

library(CDMConnector)
con <- DBI::dbConnect(odbc::odbc(), dsn = "Databricks")

writeSchema <- "omop531results"
cdmSchema <- "omop531"

cdm <- cdmFromCon(con,
                  cdmSchema = cdmSchema,
                  cdmTables = tbl_group("default"),
                  writeSchema = writeSchema)

cohortSet <- readCohortSet(system.file("cohorts2", package = "CDMConnector", mustWork = TRUE))[3,]

cdm <- generateCohortSet(cdm,
                         cohortSet,
                         name = "chrt0",
                         computeAttrition = TRUE,
                         overwrite = TRUE)

cdm$chrt0
#> # Source:   table<`omop531results`.`chrt0`> [0 x 4]
#> # Database: Spark SQL 3.2.1[token@Spark SQL/hive_metastore]
#> # … with 4 variables: cohort_definition_id <int>, subject_id <int64>,
#> #   cohort_start_date <date>, cohort_end_date <date>
cohort_set(cdm$chrt0)
#> # Source:   table<`omop531results`.`chrt0_set`> [1 x 2]
#> # Database: Spark SQL 3.2.1[token@Spark SQL/hive_metastore]
#>   cohort_definition_id cohort_name 
#>                  <int> <chr>       
#> 1                    3 GIBleed_male
DBI::dbDisconnect(con)

Created on 2023-03-24 with reprex v2.0.2

ablack3 commented 1 year ago

You can see the changes I made here if you like: https://github.com/OdyOSG/CDMConnector/pull/16/files

I'm removing comments from the cohort generation sql.

ablack3 commented 1 year ago

@lcpinheiro Any update?

lcpinheiro commented 1 year ago

Hey @ablack3, Thanks! I'll be looking at it today. Let you know shortly!

lcpinheiro commented 1 year ago

Hi @ablack3,

Just an update, I still have the same error. There seems to be an issue with this bit of code in the cdmFromCon() function.

  df1 <- data.frame(chr_col = "a", numeric_col = 1)
  # ROracle does not support integer round trip
  DBI::dbWriteTable(con, DBI::SQL(tablename), df1)

  withr::with_options(list(databaseConnectorIntegerAsNumeric = FALSE), {
    df2 <- DBI::dbReadTable(con, DBI::SQL(tablename))
    names(df2) <- tolower(names(df2))
  })

It could be a setting from our side. Will get back to this next week.

ablack3 commented 1 year ago

Ok let me check again.

lcpinheiro commented 1 year ago

Hey @ablack3,

Your solution seems to work. The different issue that I raised (https://github.com/darwin-eu/CDMConnector/issues/3#issuecomment-1492182896) is related to choosing "Use native query" in the ODBC settings. Once removed the issue goes away. Databricks seems to have a bit of a problem with using native query, with JDBC it's required, with ODBC it seems detrimental.

I did have an issue with the cdmFromCon() and cdm_from_con() functions. One doesn't seem to work.

> cdm <- cdm_from_con(con,
+                   cdmSchema = cdmSchema,
+                   cdmTables = tbl_group("default"),
+                   writeSchema = writeSchema)
Error in cdm_from_con(con, cdmSchema = cdmSchema, cdmTables = tbl_group("default"),  : 
  unused arguments (cdmSchema = cdmSchema, cdmTables = tbl_group("default"), writeSchema = writeSchema)
> cdm <- cdmFromCon(con,
+                   cdmSchema = cdmSchema,
+                   cdmTables = tbl_group("default"),
+                   writeSchema = writeSchema)
> 
ablack3 commented 1 year ago

Thanks for the info. what is a "native query"?

For cdm_from_con the argument names also need to be snake case.

cdm <- cdm_from_con(con,
                   cdm_schema = cdmSchema,
                   cdm_tables = tbl_group("default"),
                   write_schema = writeSchema)
ablack3 commented 1 year ago

@lcpinheiro Can we close this or are you still having a problem?

ablack3 commented 1 year ago

Closing this and assuming the issue is fixed. Reopen if needed.