OHDSI / Achilles

Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) - descriptive statistics about a OMOP CDM database
https://ohdsi.github.io/Achilles/
130 stars 122 forks source link

NLS_DATE_FORMAT conflict in Oracle system #225

Closed paulheider closed 6 years ago

paulheider commented 6 years ago

We're developing against an OMOP CDM v5 running in Oracle.

I had difficulties running through the initial demos because of a date formatting error.

RStudio complained about 'java.sql.SQLDataException: ORA-01861: literal does not match format string' and could only get 44% through the initial Achilles run.

According to my colleague, the underlying problem was tied to the use of CAST on an Oracle datefield in the SQL command shown in the errorReport.txt file (below). CASE is good because it is ANSI standard sql but it presupposes the date format is 'YYYYMMDD'.

When I forcibly changed the NLS_DATE_FORMAT from 'dd-mon-yyyy' to 'YYYYMMDD', then everything worked fine.

As a quick hack, I changed R/Achilles.R lines 129-137 like so:

  } else {
    conn <- DatabaseConnector::connect(connectionDetails)
    writeLines( "Setting NLS_DATE_FORMAT for this session" )
    DatabaseConnector::executeSql( conn , 'alter session set NLS_DATE_FORMAT="YYYYMMDD"' )
    writeLines("Executing multiple queries. This could take a while.")
    #writeSql(achillesSql, 'achillesDebug.sql');
    DatabaseConnector::executeSql(conn,achillesSql)
    writeLines(paste("Done. Achilles results can now be found in",resultsDatabaseSchema))
  }

The demo worked as expected and correctly generated the results entries. I don't know if a change like the one I made above (which appropriate checks) needs to be integrated into the main pipeline or a clearer explanation needs to be added to the tutorial page about how to verify that you're using a date format compliant with the expectations of Achilles.

Expected behavior

Connecting using Oracle driver
- using THIN to connect
Executing multiple queries. This could take a while.
  |==========================================================================================| 100%
Executing SQL took 25.1 secs
Done. Achilles results can now be found in <schema>
Executing Achilles Heel. This could take a while
  |==========================================================================================| 100%
Executing SQL took 1.39 secs
Done. Achilles Heel results can now be found in <schema>

Actual behavior

Connecting using Oracle driver
- using THIN to connect
Executing multiple queries. This could take a while
  |=======================================                                                   |  44%
Error: Error executing SQL:
java.sql.SQLDataException: ORA-01861: literal does not match format string

Generated errorReport.txt:

DBMS:
oracle

Error:
java.sql.SQLDataException: ORA-01861: literal does not match format string

SQL:
CREATE TABLE <schema>.t1ngmkfotemp_dates
  AS
SELECT
DISTINCT 
  EXTRACT(YEAR FROM observation_period_start_date) AS obs_year,
  CAST(CONCAT(CONCAT(TO_CHAR(EXTRACT(YEAR FROM observation_period_start_date) ), '01'), '01') AS DATE) AS obs_year_start,
  CAST(CONCAT(CONCAT(TO_CHAR(EXTRACT(YEAR FROM observation_period_start_date) ), '12'), '31') AS DATE) AS obs_year_end

FROM
<schema>.observation_period

Steps to reproduce behavior

library(Achilles)

connectionDetails <- createConnectionDetails( dbms="oracle",
                                              server="<server>/<database>", 
                                              user="...",
                                              password='...', 
                                              schema="<schema>", 
                                              port="1521")

achillesResults <- achilles(connectionDetails, 
                            cdmDatabaseSchema="<schema>", 
                            cdmVersion = "5")
t-abdul-basser commented 6 years ago

@paulheider Thank you for bringing this to our attention.

vojtechhuser commented 6 years ago

one option is to add to R code something like if ("dialect is Oracle") DatabaseConnector::executeSql( conn , 'alter session set NLS_DATE_FORMAT="YYYYMMDD"' )

Does the setting persist for the next connection? WHere to add it may need attention.

Since the command would confuse other platforms.

paulheider commented 6 years ago

It appears that the sessions set persists until the next disconnect.

I updated my local Achilles.R as per below. The first if checks for dialect and the second if is a throw-away flag I added so that I could run the achilles() multiple times in a row to see if the NLS_DATE_FORMAT was same between runs.

Achilles.R lines 129-142

  } else {
    conn <- DatabaseConnector::connect(connectionDetails)
    if( connectionDetails$dbms == "oracle" ){
        if( set_nls ){
          writeLines( "Setting NLS_DATE_FORMAT for this session" )
          DatabaseConnector::executeSql( conn ,
                                        'alter session set NLS_DATE_FORMAT="YYYYMMDD"' )
        }
    }
    writeLines("Executing multiple queries. This could take a while.")
    #writeSql(achillesSql, 'achillesDebug.sql');
    DatabaseConnector::executeSql(conn,achillesSql)
    writeLines(paste("Done. Achilles results can now be found in",resultsDatabaseSchema))
  }

Running achilles() once (and setting NLS) works fine.

achillesResults <- achilles(connectionDetails, 
                            cdmDatabaseSchema="<schema>", 
                            cdmVersion = "5" ,
                            set_nls = TRUE )

Immediately re-running (with the set_nls flag to FALSE) fails as before at 44%.

achillesResults <- achilles(connectionDetails, 
                            cdmDatabaseSchema="<schema>", 
                            cdmVersion = "5" ,
                            set_nls = FALSE )

Re-running after this failure with set_nls = TRUE makes everything run fine again.

I posted a more production-worthy implementation below and on my forked branch:

Achilles.R lines 129-142

  } else {
    conn <- DatabaseConnector::connect(connectionDetails)
    writeLines("Executing multiple queries. This could take a while.")
    if( connectionDetails$dbms == "oracle" ){
        writeLines( "Setting NLS_DATE_FORMAT for this session" )
        DatabaseConnector::executeSql( conn ,
                                      'alter session set NLS_DATE_FORMAT="YYYYMMDD"' )
    }
    #SqlRender::writeSql(achillesSql, 'achillesDebug.sql');
    DatabaseConnector::executeSql(conn,achillesSql)
    writeLines(paste("Done. Achilles results can now be found in",resultsDatabaseSchema))
  }
t-abdul-basser commented 6 years ago

Thanks @paulheider. Please consider submitting this as pull request so that we can review for potential integration into the codebase.