OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
130 stars 169 forks source link

Drilldown data not consistently loading #2079

Open CyloNox opened 2 years ago

CyloNox commented 2 years ago

Expected behavior

When clicking a block in the treemap in the "data source" section a drilldown event is trigger to pull that data down and display in the sub grids

Actual behavior

No Data found displayed in the subsection

Steps to reproduce behavior

After researching and providing a quick fix on our end it is a combinations of updates that makes this happen. When we started using Achilles 1.7.0 there were several updates made but one in particular changed some values in the @results_database_schema.achilles_results_dist table. The one that is causing this issue is this one https://github.com/OHDSI/Achilles/commit/7bccb474bac5b60138e5481e2099315226133f50

It changed the fill-in value from something like "0.23 secs" to "0.23" in the stratum_1 column. By doing this it open the door for a catch that was working in the past to exclude these 44 rows to be evaluated by using the isnumeric() function. Unfortunately, the error we get is inconsistent on data sources and selections but we do know it's queries like this one that produce the error. The first time we experienced this was when it was running this .sql statement and it produced an error in the logs not being able to convert the value to INT. https://github.com/OHDSI/WebAPI/blob/master/src/main/resources/resources/cdmresults/sql/report/observation/drilldown/ageAtFirstOccurrence.sql

Quick short-term fix

We update all 44 rows with the values in the table from "0.23" back to "0.23 secs" and it works as expected.

Long term fix

The better approach would be to update all the sql statements like the one above by filtering out in the where clause of non-integer values. aka "ard1.stratum_1 NOT LIKE '%.%'" or something similar that the values don't get evaluated and passed as true in the overall statement and throws an error in the logs and displays no data to the user.

-Adam

chrisknoll commented 2 years ago

I think we've seen some trouble with the is_numeric and instead I think we're going to use a CASE statement to check on the analysisid in order to cast the stratum* columns to integer. Then we don't run into confusion about blindly checking the column value as numeric, when the real check should be based on the analysisid becaus the analysis specifies the type of data to be put into the stratum columns.

CyloNox commented 2 years ago

The current statement has that filtering of analysis_id already in there but the problem we suspect is that the engine in sql and perhaps this might just be a MS SQL issue evaluates the CASE statement prior to the WHERE clause. It's inconsistent with the results so perhaps how it's indexed in the background could be a reason why it sometimes comes and sometimes doesn't, we don't know for sure. The one thing we do know is that this error pops up every now and then.

SQL statement in WebAPI

SELECT
  c1.concept_id     AS concept_id,
  c2.concept_name   AS category,
  ard1.min_value    AS min_value,
  ard1.p10_value    AS p10_value,
  ard1.p25_value    AS p25_value,
  ard1.median_value AS median_value,
  ard1.p75_value    AS p75_value,
  ard1.p90_value    AS p90_value,
  ard1.max_value    AS max_value
FROM @results_database_schema.achilles_results_dist ard1
INNER JOIN @vocab_database_schema.concept c1 ON CAST(CASE WHEN isNumeric(ard1.stratum_1) = 1 THEN ard1.stratum_1 ELSE null END AS INT) = c1.concept_id
INNER JOIN @vocab_database_schema.concept c2 ON CAST(CASE WHEN isNumeric(ard1.stratum_2) = 1 THEN ard1.stratum_2 ELSE null END AS INT) = c2.concept_id
WHERE ard1.analysis_id = 806
AND c1.concept_id = @conceptId
chrisknoll commented 2 years ago

Yep, it's inconsistent across DBMS systems if the where is applied or the join is applied first. So, the thought is that to change the case statement to:

CAST(CASE WHEN analysis_id = 806 then THEN ard1.stratum_1 ELSE null END AS INT)
CyloNox commented 2 years ago

Oh, ok I understand now, thank you

CyloNox commented 2 years ago

We are in the process of using Databricks and the spark libraries to connect into Databricks and pull from delta tables. One item that has also been brought to our attention is that isNumeric() function is not allowed in spark sql.

https://sparkbyexamples.com/spark/spark-check-string-column-has-numeric-values/#:~:text=Unfortunately%2C%20Spark%20doesn't%20have,they%20do%20not%20perform%20well.

By not having that built-in function will actually help with this problem as well.

chrisknoll commented 2 years ago

Thanks for the update. I think the change from switching to use analysis_id was suggested elsewhere, but it's such a broad change, we wanted to wait for more consensus about the change, so I'll mark you down as a +1 for it :).