OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
129 stars 167 forks source link

Drilldown Reports cast issue #2187

Closed alondhe closed 1 year ago

alondhe commented 1 year ago

Expected behavior

When Achilles has been executed with benchmark records, these records have decimal values stored in the stratum_1 fields of achilles_results and achilles_results_dist. These should not cause a casting issue for the drilldown report SQLs.

Actual behavior

In Redshift, we see this cause errors. As the drilldown queries utilize a casting of isnumeric, if the value is a decimal value, it will be numeric but not castable to int.

ERROR: Invalid digit, Value '.', Pos 2, Type: Integer 
  Detail: 
  -----------------------------------------------
  error:  Invalid digit, Value '.', Pos 2, Type: Integer 
  code:      1207
  context:   28.98
  query:     3749295
  location:  :0
  process:   query1_241_3749295 [pid=0]
  -----------------------------------------------

Steps to reproduce behavior

Run Achilles with benchmarks enabled. Try examining the drilldown reports in Data Sources. Observe no results, and then see the SQL errors in the WebAPI logs.

anthonysena commented 1 year ago

@alondhe - maybe an aside here but can you reference how to run Achilles with benchmarks enabled? Just curious as I was discussing a similar idea with @AnthonyMolinaro yesterday on how to detect if an Achilles analysis ran but produced no records. I suspect that such a benchmark concept could satisfy the need.

alondhe commented 1 year ago

I actually don't see a way to run without benchmarks enabled. We probably could stand to have that as a parameter in Achilles. But to your point, perhaps if benchmarks are enabled, a QC check could try to highlight something took > 0 seconds but resulted in 0 records?

anthonysena commented 1 year ago

But to your point, perhaps if benchmarks are enabled, a QC check could try to highlight something took > 0 seconds but resulted in 0 records?

Yes - that's the idea. We've been working on the DataDiagnostics package/module and one of the steps in that process aims to see if all of the necessary Achilles analyses have been run. The way it aims to detect this is by checking the achilles_results table(s) and if it does not find an analysis, it will run it. But sometimes an analysis did run but it produced 0 results. So at the moment we can't distinguish two states a) it ran with no results or b) if it was never run. So it can be a bit inefficient until we have a mechanism to know the state of an analysis.

AnthonyMolinaro commented 1 year ago

Hey guys, please have a look at the achilles_processing table.

The point of the table suggested by @fdefalco is to avoid the situation where the entire process fails because one query fails when building the giant UNION-ALL. I think we can kill two birds with one stone by adding an additional column, STATUS, to indicate whether a given analysis was skipped or not. There are two scenarios that would cause an analysis to not be present in achilles_results: (i) It was skipped (explicitly or by default) or (ii) It ran but resulted in "no data found". The additional column in the achilles_processing table will allow us to easily distinguish between the cases and avoid rerunning analyses unnecessarily.

I'll get this functionality added to Achilles.

alondhe commented 1 year ago

Hey @AnthonyMolinaro -- I think this approach is great. Moving the benchmarks out, as they don't really fit with the achilles results, and then giving us this status capability to essentially QC what was run would be huge.

And we can close this WebAPI issue in that case.

fdefalco commented 1 year ago

The benchmarks are used so we will have a way to maintain them going forward. Having a status as well would be a welcome addition. We do want to know what analyses were run, how long they took, and if they generated data or not as opposed to currently being unaware of if something ran in the case where no data was generated, in which case, we still want to know how long it took to generate nothing. :)

anthonysena commented 1 year ago

Profiling this issue reveals the error as reported originally. Including the full query here:

org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback; uncategorized SQLException for SQL [SELECT
  c1.concept_id     AS concept_id,
  c2.concept_name   AS category,
  ard1.min_value    AS min_value,
  ard1.p10_value    AS p10_value,
  ard1.p25_value    AS p25_value,
  ard1.median_value AS median_value,
  ard1.p75_value    AS p75_value,
  ard1.p90_value    AS p90_value,
  ard1.max_value    AS max_value
FROM results.achilles_results_dist ard1
INNER JOIN
cdm.concept c1
ON CAST(CASE WHEN REGEXP_INSTR(ard1.stratum_1, '^[\-\+]?(\\d*\\.)?\\d+([Ee][\-\+]?\\d+)?$') = 1 THEN ard1.stratum_1 ELSE null END AS INT) = c1.concept_id
INNER JOIN
cdm.concept c2
ON CAST(CASE WHEN REGEXP_INSTR(ard1.stratum_2, '^[\-\+]?(\\d*\\.)?\\d+([Ee][\-\+]?\\d+)?$') = 1 THEN ard1.stratum_2 ELSE null END AS INT) = c2.concept_id
WHERE ard1.analysis_id = 706
AND ard1.count_value > 0
AND c1.concept_id = ?]; 

SQL state [XX000]; error code [500310]; [Amazon](500310) Invalid operation: Invalid digit, Value '.', Pos 2, Type: Integer 
Details: 
 -----------------------------------------------
  error:  Invalid digit, Value '.', Pos 2, Type: Integer 
  code:      1207
  context:   43.45
  query:     230492237
  location:  :0
  process:   query10_199_230492237 [pid=0]
  -----------------------------------------------;

From discussion with @alondhe @chrisknoll - we can revise the query above to use the analysis_id in the CASE statement to prevent these types of CASTing problems.