MathematicalMedicine / diver-issues

Semipublic tracking of issues for the DIVER front end
0 stars 0 forks source link

Sufficiently genericize constraints implementation such that it can be used unmodified in all of cohort, pedigree, and custom variable creation #274

Open Viqsi opened 2 weeks ago

Viqsi commented 2 weeks ago

This is something I've been doing vague experimentation on but haven't really recorded here, and I really ought to.

The short version is that right now we have three different implementations of "constraining a variable" on the backend:

What we really ought to have is a single implementation of constraint functionality that can work with all these cases, such that they can be broadly supported. We have that potentially built into the frontend already and have had to work around it to accommodate backend limitations in this regard. So unifying it would be helpful.

Viqsi commented 2 weeks ago

Taken from my worknotes diary on this:

2024-11-04 10:54:19-0500 So, about the wild and crazy experimental ideas I had before.

Basically I'm looking at having a single "comparison function" that is akin to VALUE_IS_BETWEEN but covers all cases. Then all I need is a comparison type code. That would cover all needs. The prepared statement generation would just have to call that function with the correct params.

The worry is that we're seemingly having to bypass much of the behavior of SQL builtins. Certainly it wouldn't be as optimal. Maybe that could be addressed by, instead of doing the comparison in the function, figuring out what the comparison should be in the function and returning SQLish text for a prepared statement. The concern there is that I'm not sure if all workarounds will actually cooperate with such an approach - VALUE_IS_BETWEEN behavior changes partly based on what the target value is, after all. So to some extent custom comparison functions would still be needed.

Essentially the core idea being toyed with is always using prepared statements (and getting rid of the twisted maze of macros that currently handles cohort creation in the process). That seems a feasible goal.

The bulk of that work already more or less exists as it's the approach used in API_ParseJSONDefinition. It'd be basically adapting it to broader circumstances, and then adapting all of the affected classes of cohorts (subset cohorts and pedigree cohorts) to fit. And figuring out something for Set Operation Cohorts, since they use that macro system but fundamentally aren't based on a SQL constraint but rather on interactions between subqueries.

The end result of this might thereby lead to API changes. Certainly, we'd have to do something about the "PEDIGREE" cohort type as it doesn't specify the constraint type. (But if we go anywhere significant with #260, we may end up doing that anyways...)