Closed ayo-uda closed 3 years ago
I believe I have pinpointed the problem. The query made by the calling of merge_achilles_tables.sql (snipped provided below) within the achilles.R script gives rise to the introduction of an **invalid character(0) column** throughout the rendered SQL query that is causing the error. I have provided examples of the rendered query and the resulting error message below.
SQL:
--HINT DISTRIBUTE_ON_KEY(analysis_id)
CREATE TABLE resultsDataset.achilles_results
AS
SELECT
analysis_id, stratum_1, stratum_2, stratum_3, stratum_4, stratum_5, count_value
FROM
(
select cast(analysis_id as int64) as analysis_id, cast(stratum_1 as string) as stratum_1, cast(stratum_2 as string) as stratum_2, cast(stratum_3 as string) as stratum_3, cast(stratum_4 as string) as stratum_4, cast(stratum_5 as string) as stratum_5, cast(count_value as int64) as count_value from
resultsDataset.tmpach_0 union all select 2000000 as analysis_id, '13.638783 secs' as stratum_1, null as stratum_2, null as stratum_3, null as stratum_4, null as stratum_5, character(0)
union all
select cast(analysis_id as int64) as analysis_id, cast(stratum_1 as string) as stratum_1, cast(stratum_2 as string) as stratum_2, cast(stratum_3 as string) as stratum_3, cast(stratum_4 as string) as stratum_4, cast(stratum_5 as string) as stratum_5, cast(count_value as int64) as count_value from
resultsDataset.tmpach_1 union all select 2000001 as analysis_id, '11.222780 secs' as stratum_1, null as stratum_2, null as stratum_3, null as stratum_4, null as stratum_5, character(0) ...
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"location" : "q",
"locationType" : "parameter",
"message" : "Function not found: character at [13:225]",
"reason" : "invalidQuery"
} ],
"message" : "Function not found: character at [13:225]",
"status" : "INVALID_ARGUMENT"
}
if (!sqlOnly & includeRawCost) {
# obtain the runTime for this analysis
benchmarkSqls <- lapply(c(15000, 16000), function(rawCostId) {
runTime <- .getAchillesResultBenchmark(rawCostId, outputFolder)
benchmarkSelects <- lapply(resultsTable$schema$FIELD_NAME, function(c) {
if (tolower(c) == "analysis_id") {
sprintf("%d as analysis_id", .getBenchmarkOffset() + rawCostId)
} else if (tolower(c) == "stratum_1") {
sprintf("'%s' as stratum_1", runTime)
} else if (tolower(c) == "count_value") {
sprintf("%d as count_value", smallCellCount + 1)
} else {
sprintf("NULL as %s", c)
}
})
SqlRender::render(sql = "select @benchmarkSelect", benchmarkSelect = paste(benchmarkSelects, collapse = ", "))
})
benchmarkSql <- paste(benchmarkSqls, collapse = " union all ")
benchmarkSql <- SqlRender::translate(sql = benchmarkSql, targetDialect = connectionDetails$dbms)
detailSqls <- c(detailSqls, benchmarkSql)
}
SqlRender::loadRenderTranslateSql(sqlFilename = "analyses/merge_achilles_tables.sql",
packageName = "Achilles",
dbms = connectionDetails$dbms,
warnOnMissingParameters = FALSE,
createTable = createTable,
resultsDatabaseSchema = resultsDatabaseSchema,
detailType = resultsTable$detailType,
detailSqls = paste(detailSqls, collapse = " \nunion all\n "),
fieldNames = paste(resultsTable$schema$FIELD_NAME, collapse = ", "),
smallCellCount = smallCellCount)
}
We are experiencing the same issue against on SQL Server. #433
@t-abdul-basser @alondhe @cukarthik I am finding the following issues which would certainly break the Achilles/SQLRender process down.
analysis 903 and 802 stratum 4 is mislabeled. This causes a break in the merge. I fixed it locally.
Analysis 1425 does not have the appropriate sqlrender replacement tag to allow for appropriate insertion of of schema, db, and whatever other prefix is appropriate (scratch or temp). When downstream code is looking for the name X (tmpach_1425 there is only something called ACHILLES_PREFIX_1425) just as an example. I will make this adjustment locally as well and move forward.
Here is an example where you see stratum 18 being generated but stratum 4 is being referenced later on. Is the stratum18 name supposed to signify something? The downstream code wants stratum 4..
select analysis_id,
cast(null as varchar(255)) as stratum_1, cast(null as varchar(255)) as stratum_2, cast(null as varchar(255)) as stratum_3,
**cast(null as varchar(255)) as stratum_18**, cast(null as varchar(255)) as stratum_5,
count_value, min_value, max_value, avg_value, stdev_value, median_value, p10_value, p25_value, p75_value, p90_value
into cdm_mdcr.results.tmpach_dist_1803
from `#tempResults_1803`
Thanks @djb2188. Interesting. Please submit a PR when ready so that we can review these changes.
@alondhe @t-abdul-basser The character(0)
string is produced by this code sprintf("%d as count_value", smallCellCount + 1)
if the value of smallCellCount
is set to NULL
per the package documentation to disable masking the actual counts when it is not needed. A potential work-around might be to set smallCellCount = 0
instead of NULL
.
This last comment is accurate. I will update the documentation to note that smallCellCount should be set to 0 instead of NULL to close this issue.
Expected behavior
The generation of summary and results table data for a OMOP v5.3 CDM instance sourced from BigQuery after executing achilles.
Actual behavior
An 'Error getting job status' that pops up while attempting to merge created achilles result tables several minutes into the execution:
The interrupted execution leaves multiple temp tables remaining in the bq dataset designated as the scratch database schema.
Steps to reproduce behavior