darwin-eu / CDMConnector

A pipe friendly way to interact with an OMOP Common Data Model
https://darwin-eu.github.io/CDMConnector/
Apache License 2.0
12 stars 10 forks source link

[CDMConnector/Simba ODBC Driver] - Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: REPLACE TABLE AS SELECT is only supported with v2 tables. #23

Open cxndia23 opened 3 months ago

cxndia23 commented 3 months ago

We need to use CDMConnector to process datas with R Scripts. Our contained these before adding implementation for CDMConnector : Thrift Server -interface-> Spark/spark_catalog <-sync-> HiveCatalog/Hive Warehouse image (7)

Then, We began The configuration : ODBC DRIVER

  1. Install and Configure the Simba Driver (from this link).
  2. Driver Configuration
    
    # /etc/odbc.ini

[ODBC Data Sources] HiveDSN=Our Hive Driver Simba Spark 64-bit=Simba Spark ODBC Driver 64-bit

[Simba Spark 64-bit] Description=Simba Spark ODBC Driver (64-bit) DSN Driver=Simba Spark ODBC Driver 64-bit HOST=thrift-spark-thrift-server.hive-thrift.svc PORT=10000 UseNativeQuery=0

/etc/odbcinst.ini

[ODBC Drivers] Simba Spark ODBC Driver 64-bit=Installed

[Simba Spark ODBC Driver 64-bit] Description=Simba Spark ODBC Driver (64-bit) Driver=/opt/simba/spark/lib/64/libsparkodbc_sb64.so


**DBI INTERFACE**
We created then the DBI Interface, using : 
`odbc_connection <- DBI::dbConnect(odbc::odbc(), dsn="Simba Spark 64-bit", uid="", pwd="")`

Using this Interface, we created Database containing OMOP Format Tables (from OMOP Format CSV Files)
##1. Read CSV File

csv_source_file <- "examplefile.csv" dataframe <- read.csv(csv_source_file, header = TRUE)

2. Write CSV file in Frame

dbWriteTable(odbc_connection, "testdataframe", dataframe, overwrite = TRUE)

3. Create the OMOP Format Database

dbExecute(odbc_connection, "CREATE DATABASE IF NOT EXISTS omopformatdatabase")

4. Create Table from the Frame in our Database

dbExecute(odbc_connection, "CREATE TABLE omopformatdatabase.testtableAS SELECT * FROM testdataframe")

5. Use OMOP Format Database

dbExecute(odbc_connection, "USE omopformatdatabase")


At this step, we had created : 

- our dataframe created from the OMOP Format CSV File;
- our database;
- our table created in the Database from the Data Frame

**CDMCONNECTOR**
Assuming that CDMConnector would work on OMOP Format Database, we run this : 
`cdm_connector <- CDMConnector::cdm_from_con(con = odbc_connection, cdm_schema = "omopformatdatabase", write_schema = "omopformatdatabase",  cdm_name = "theCDMonv2")`

We have this Error : 

Error in CDMConnector::cdm_from_con(): ! ODBC failed with error 00000 from [Simba][Hardy]. ✖ (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: REPLACE TABLE AS SELECT is only supported with v2 tables. • at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveT • 'CREATE OR REPLACE TABLE omopformatdatabase.ugxuw_test_table AS SELECT * • FROM omopformatdatabase.tkawl_temp_table' ℹ From nanodbc/nanodbc.cpp:1722. Run rlang::last_trace() to see where the error occurred.



Version of Services : 

- Spark : 3.2.0 (with matching Hive and Thrift for this version)
- CDMConnector : 1.5.0
- Simba Driver : 2.8.2
ablack3 commented 1 day ago

I think the error you see is happening because you are using v1 tables (Hive tables). If you are able to use v2 tables then I think the 'create or replace' syntax should work.

Try converting all the cdm tables to delta using this syntax.

CONVERT TO DELTA `omopformatdatabase.ugxuw_test_table`;

Then try recreating the cdm.