OHDSI / Ares

A Research Exploration System
https://ohdsi.github.io/Ares/
Apache License 2.0
12 stars 6 forks source link

Snowflake numeric type issue with Ares Exporter - Achilles::exportToAres - for CDM version 5.4 #366

Open heschmidt04 opened 7 months ago

heschmidt04 commented 7 months ago

Describe the bug Ares Exporter fails on numeric type because temp table in memory has autosensing data type conversion issue. A search on github Ares issues didn't yield anything for me if this has happened before in export.

To Reproduce Steps to reproduce the behavior:

  1. Run Achilles generated SQL in Snowflake with sqlonly = T to create Achilles tables
  2. Run DQD in RStudio - authentication via external browser to Snowflake
  3. Run Ares Exporter in RStudio - authentication via external browser to Snowflake
  4. Run modified SQL from RStudio Ares Exporter error errorReportSql.txt in Snowflake

Expected behavior Ares Exporter to write files to output folder.
Is there a possible work around for this part?

Screenshots DBMS: snowflake

Error: net.snowflake.client.jdbc.SnowflakeSQLException: Numeric value '' is not recognized

Stackoverflow: https://stackoverflow.com/questions/70176093/numeric-value-is-not-recognized

-- SQL modified to debug 
CREATE TABLE AO_export_error AS 
select    t1.table_name as SERIES_NAME
        , t1.stratum_1 as X_CALENDAR_MONTH
        , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
from
(
        select 'Visit occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
) t1
inner join
(select CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 117 GROUP BY analysis_id, stratum_1, count_value) denom -- WORKS  
on t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

-- Debugging work 
CREATE OR REPLACE TABLE AO_export_denom AS
SELECT 
    CAST(stratum_1 as bigint) stratum_1
    , count_value 
FROM achilles_results 
WHERE analysis_id = 117 
GROUP BY analysis_id
        , stratum_1
        , count_value

-- Debugging work 
CREATE OR REPLACE TABLE AO_export_t1 AS 
        select 'Visit occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
-- Putting debug together 

SELECT   
      t1.table_name as SERIES_NAME
    , t1.stratum_1 as X_CALENDAR_MONTH
    , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
FROM AO_export_t1 AS t1 
JOIN AO_export_denom AS denom 
  ON t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

The work around works because the datatype is auto sensed

-- AO_EXPORT_DENOM

name    type    kind
STRATUM_1   NUMBER(38,0)    COLUMN
COUNT_VALUE NUMBER(38,0)    COLUMN

-- AO_EXPORT_T1

name    type    kind
TABLE_NAME  VARCHAR(20) COLUMN
STRATUM_1   NUMBER(38,0)    COLUMN
COUNT_VALUE NUMBER(38,0)    COLUMN

Desktop (please complete the following information): R version: R version 4.3.1 (2023-06-16) Platform: aarch64-apple-darwin20

Attached base packages:

Other attached packages:

Additional context Files that were created from the process are these. No error.txt file in the errors directory.

Mar 15 17:06 dq-result_camel.json Mar 15 17:06 log_DqDashboard_Snowflake-MGB-OMOP.txt -- binary file??? Not sure why. Mar 15 17:06 dq-result.json Mar 15 17:06 datadensity-total.csv Mar 15 17:06 records-by-domain.csv

thejasbh commented 3 months ago

I encountered the same issue when exporting data to Ares. After some investigation, I discovered that using the TRY_CAST function in Snowflake was an effective solution. By implementing TRY_CAST across all relevant scripts in the Achilles::exportToAres function, I was able to resolve the issue and ensure smooth data exports to Ares.

However, it's important to note that while this solution works well for Snowflake, it may not be suitable for other databases that do not support the TRY_CAST function. This can lead to compatibility issues across different database systems. Therefore, to make the scripts work universally across all databases, a more complex fix might be necessary.

Example of the modification:

Original script :

CREATE TABLE AO_export_error AS 
select    t1.table_name as SERIES_NAME
        , t1.stratum_1 as X_CALENDAR_MONTH
        , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
from
(
        select 'Visit occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
) t1
inner join
(select CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 117 GROUP BY analysis_id, stratum_1, count_value) denom
on t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

Script with TRY_CAST:

CREATE TABLE AO_export_error AS 
select    t1.table_name as SERIES_NAME
        , t1.stratum_1 as X_CALENDAR_MONTH
        , round(1.0*t1.count_value/denom.count_value,5) as Y_RECORD_COUNT
from
(
        select 'Visit occurrence' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 220 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition occurrence' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 420 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Death' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 502 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Procedure occurrence' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 620 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug exposure' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 720 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 820 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Drug era' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 920 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Condition era' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1020 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Observation period' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 111 GROUP BY analysis_id, stratum_1, count_value
        union all
        select 'Measurement' as table_name, **TRY_CAST**(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 1820 GROUP BY analysis_id, stratum_1, count_value
) t1
inner join
(select CAST(stratum_1 as bigint) stratum_1, count_value from achilles_results where analysis_id = 117 GROUP BY analysis_id, stratum_1, count_value) denom   
on t1.stratum_1 = denom.stratum_1
ORDER BY SERIES_NAME, t1.stratum_1

Note: The reason TRY_CAST works in this context is because, as mentioned by @heschmidt04, Snowflake attempts to auto-detect data types in tables. The TRY_CAST function includes built-in error handling, allowing the script to continue executing even if type conversion fails, unlike CAST which may halt execution on errors.