OHDSI / Hydra

An R package and Java library for hydrating package skeletons into executable R study packages based on specifications in JSON format.
https://ohdsi.github.io/Hydra
4 stars 7 forks source link

Generate stats is hard coded to be false #21

Closed gowthamrao closed 3 years ago

gowthamrao commented 3 years ago

Expected behavior: we need to be able to generate sql with statistics

Actual behavior: Hydra hard codes generate stats to FALSE https://github.com/OHDSI/Hydra/blob/c623d6209af41f80e27ff09e144812a31c64975d/java/org/ohdsi/hydra/actionHandlers/JsonToSql.java#L36

gowthamrao commented 3 years ago

Impact - the generated SQL has

FROM #final_cohort CO ;

{0 != 0}?{ -- Find the event that is the 'best match' per person.
-- the 'best match' is defined as the event that satisfies the most inclusion rules. -- ties are solved by choosing the event that matches the earliest inclusion rule, and then earliest.

select q.person_id, q.event_id

gowthamrao commented 3 years ago

Maybe we could replace the use of java with R to generate the sql?

genOp <- CirceR::createGenerateOptions(cohortIdFieldName = "cohort_definition_id", cohortId = fileName, cdmSchema = "@cdm_database_schema", targetTable = "@target_cohort_table", resultSchema = "@results_database_schema", vocabularySchema = "@vocabulary_database_schema", generateStats = TRUE) sql <- CirceR::buildCohortQuery(expression = cohortExpression, options = genOp) SqlRender::writeSql(sql = sql, targetFile = file.path(outputFolder, 'inst', 'sql', 'sql_server', paste0(fileName, '.sql')))

schuemie commented 3 years ago

We can't replace the Java with R, because Hydra is also used in WebAPI to hydrate packages where R is not available.

We can add an argument allowing one to set generate statistics to TRUE, that would default to the current behavior for backwards compatibility.

schuemie commented 3 years ago

Now in the develop branch. I added an example usage here: https://github.com/OHDSI/SkeletonCohortDiagnosticsStudy/commit/ab282edf61dd581f4564fbae36b58e07d8c18767

I haven't tested it. Leaving that to @gowthamrao .

gowthamrao commented 3 years ago

Thank you @schuemie . That makes sense, because WebApi cannot use R.

Would it be possible to add an option called 'useRtoGenerateSql = FALSE', and if TRUE - it could use the installed version of CirceR for

sql <- CirceR::buildCohortQuery(expression = cohortExpression, options = genOp)

IF TRUE, it would overwrite the SQL output of Java. WebApi wont be impacted because Java would render the SQL by default

Reason: it would allow R users to select CirceR version

schuemie commented 3 years ago

This seems to be a very special use case, so I don't want to add a lot of complexity to Hydra (that would also need to have its own testing apparatus) just for that.

If a user would want to use a specific CirceR version instead they can always do that after Hydra is finished, and simply overwrite the SQL files.

gowthamrao commented 3 years ago

Yes - that makes sense. Agree. thanks

I haven't tested it. Leaving that to @gowthamrao .

btw - tested the new code after changes, works perfectly