Closed alondhe closed 1 year ago
@alondhe - maybe an aside here but can you reference how to run Achilles with benchmarks enabled? Just curious as I was discussing a similar idea with @AnthonyMolinaro yesterday on how to detect if an Achilles analysis ran but produced no records. I suspect that such a benchmark concept could satisfy the need.
I actually don't see a way to run without benchmarks enabled. We probably could stand to have that as a parameter in Achilles. But to your point, perhaps if benchmarks are enabled, a QC check could try to highlight something took > 0 seconds but resulted in 0 records?
But to your point, perhaps if benchmarks are enabled, a QC check could try to highlight something took > 0 seconds but resulted in 0 records?
Yes - that's the idea. We've been working on the DataDiagnostics package/module and one of the steps in that process aims to see if all of the necessary Achilles analyses have been run. The way it aims to detect this is by checking the achilles_results table(s) and if it does not find an analysis, it will run it. But sometimes an analysis did run but it produced 0 results. So at the moment we can't distinguish two states a) it ran with no results or b) if it was never run. So it can be a bit inefficient until we have a mechanism to know the state of an analysis.
Hey guys, please have a look at the achilles_processing table.
The point of the table suggested by @fdefalco is to avoid the situation where the entire process fails because one query fails when building the giant UNION-ALL. I think we can kill two birds with one stone by adding an additional column, STATUS, to indicate whether a given analysis was skipped or not. There are two scenarios that would cause an analysis to not be present in achilles_results: (i) It was skipped (explicitly or by default) or (ii) It ran but resulted in "no data found". The additional column in the achilles_processing table will allow us to easily distinguish between the cases and avoid rerunning analyses unnecessarily.
I'll get this functionality added to Achilles.
Hey @AnthonyMolinaro -- I think this approach is great. Moving the benchmarks out, as they don't really fit with the achilles results, and then giving us this status capability to essentially QC what was run would be huge.
And we can close this WebAPI issue in that case.
The benchmarks are used so we will have a way to maintain them going forward. Having a status as well would be a welcome addition. We do want to know what analyses were run, how long they took, and if they generated data or not as opposed to currently being unaware of if something ran in the case where no data was generated, in which case, we still want to know how long it took to generate nothing. :)
Profiling this issue reveals the error as reported originally. Including the full query here:
org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback; uncategorized SQLException for SQL [SELECT
c1.concept_id AS concept_id,
c2.concept_name AS category,
ard1.min_value AS min_value,
ard1.p10_value AS p10_value,
ard1.p25_value AS p25_value,
ard1.median_value AS median_value,
ard1.p75_value AS p75_value,
ard1.p90_value AS p90_value,
ard1.max_value AS max_value
FROM results.achilles_results_dist ard1
INNER JOIN
cdm.concept c1
ON CAST(CASE WHEN REGEXP_INSTR(ard1.stratum_1, '^[\-\+]?(\\d*\\.)?\\d+([Ee][\-\+]?\\d+)?$') = 1 THEN ard1.stratum_1 ELSE null END AS INT) = c1.concept_id
INNER JOIN
cdm.concept c2
ON CAST(CASE WHEN REGEXP_INSTR(ard1.stratum_2, '^[\-\+]?(\\d*\\.)?\\d+([Ee][\-\+]?\\d+)?$') = 1 THEN ard1.stratum_2 ELSE null END AS INT) = c2.concept_id
WHERE ard1.analysis_id = 706
AND ard1.count_value > 0
AND c1.concept_id = ?];
SQL state [XX000]; error code [500310]; [Amazon](500310) Invalid operation: Invalid digit, Value '.', Pos 2, Type: Integer
Details:
-----------------------------------------------
error: Invalid digit, Value '.', Pos 2, Type: Integer
code: 1207
context: 43.45
query: 230492237
location: :0
process: query10_199_230492237 [pid=0]
-----------------------------------------------;
From discussion with @alondhe @chrisknoll - we can revise the query above to use the analysis_id
in the CASE statement to prevent these types of CASTing problems.
Expected behavior
When Achilles has been executed with benchmark records, these records have decimal values stored in the stratum_1 fields of achilles_results and achilles_results_dist. These should not cause a casting issue for the drilldown report SQLs.
Actual behavior
In Redshift, we see this cause errors. As the drilldown queries utilize a casting of isnumeric, if the value is a decimal value, it will be numeric but not castable to int.
Steps to reproduce behavior
Run Achilles with benchmarks enabled. Try examining the drilldown reports in Data Sources. Observe no results, and then see the SQL errors in the WebAPI logs.