Novartis / YADA

Open-source Data Ops
Apache License 2.0
81 stars 21 forks source link

Suggestion on name-parameters in (sql) queries #69

Open christopheleroy opened 7 years ago

christopheleroy commented 7 years ago

Hello,

Here is an example that illustrates why I'm suggesting this. So, we have a row of data that may be flagged in the app as "missing" to mean, missing data. The application (aspirin) lets the user confirm whether the row is missing for a good reason or another. When the row's reason is confirmed, we want to update the row as 'confirmed', however, we don't want to flag it as confirmed if the reason was that the row was corrupt (file truncated etc).

So, you can do that in a few ways:

Case 1- in 2 steps (1a) UPDATE tbl SET REASON_MISSING_ROW=?v WHERE p_rowid=?v (1b) UPDATE tbl SET IS_CONFIRMED = (CASE WHEN REASON_MISSING_ROW = 'corrupt' OR REASON_MISSING_ROW IS NULL THEN 0 ELSE 1 END) WHERE p_rowid=?v

Case 2- in 1 step (2) UPDATE tbl SET REASON_MISSING_ROW=?v, IS_CONFIRMED=?n WHERE p_rowid=?v

Another case is to use "named parameters" in Yada queries: Case X - in 1 step: (X) UPDATE tbl SET REASON_MISSING_ROW=?v:reason, IS_CONFIRMED=(CASE when ?v:reason = 'corrupt' OR ?v:reason IS NULL THEN 0 ELSE 1 END) WHERE p_rowid = ?v

The suggestion is to allow tags on the parameters, so that in jsonParams you could run the query with [{qname:"QQ",DATA:[{"reason":"ok",p_rowid:"23132132"},{"reason":"corrupt","P_ROWID":2342363}]] instead of [{qname:"QQ",DATA:[{"REASON_MISSING_ROW":"ok",P_ROWID:"23132132"},{"REASON_MISSING_ROW":"corrupt","P_ROWID":2342363}]]

This enables case X.

Maybe it is a personal slant of mine, but:

Case 2 is not very desirable because the Yada query does not enforce an important business rule. The onus is on the application or the yada developer cow-boys. Not cool. Case 1 also puts the onus on the app developer to remember to call both queries and call them in the right order - however, perhaps (1a) can be configured to force a follow-up query, (1b) -- as a default parameter with post-process [can this be done in Yada 8?] But anyway, it gets complicated to configure these cascading 1a ==> 1b queries, and "named parameters" could be a neat addition to Yada, also allowing an application developer to rely on a "field name" (e.g here 'reason') that may be different than the internal table (REASON_MISSING_ROW). That is much better for data integration, as important as the harmonizer ...

thanks for the consideration, if you agree, I could try to make it happen...

christopheleroy commented 7 years ago

regarding, Case 1, please note that UDPATE tbl set REASON_MISSING_ROW=?v, IS_CONFIRMED = (CASE WHEN REASON_MISSING_ROW = 'corrupt' OR REASON_MISSING_ROW IS NULL THEN 0 ELSE 1 END) WHERE p_rowid=?v would not work because the "CASE/WHEN" uses the value before the UPDATE not the ?v value after the update...

christopheleroy commented 6 years ago

thoughts on this @varontron ? I believe this extension would also solve many of my headaches: I have a few queries that work when submitting positional params but won't work in JSONParams with named params because the SQL is a little too esoteric for the Yada (de)parser. right now, I'm using named params (using native column names) everywhere, and YADA_1 YADA_2 .. YADA_5 'named params in JSONParams' for these queries. This is ugly and very risky when the system has to evolve...

christopheleroy commented 6 years ago

and here is another example. I want to expand my app to merge country inventory and site inventory. Nice, I can expand my yada query: instead of

SELECt * from country_inventory where protocol_number = ?v

I can:

(SELECt from country_inventory UNION select from site_inventory)where protocol_number = ?v

well, no I can't. I can do

SELECT from ( (SELECT from country_inventory UNION select * from site_inventory) where protocol_number = ?v

but I'm not too hopeful on performance.

but if I do SELECT from country_inventory where protocol_number = ?v UNION select from site_inventory where protocol_number = ?v

now, I need to pass 2 protocol_numbers when I call the query.

ALSO, now, I have open the possibility to get a merge inventory of country inventory of study-1 with site inventory of study-2 -- well, that's dumb, unless all my developers are smart, attentive and not overloaded.

ah!

If only I could do:

SELECT from country_inventory where protocol_number = ?v:pcol UNION select from site_inventory where protocol_number = ?v:pcol

that'd be nice

:)

varontron commented 6 years ago

What is "pcol"? Is it a column name in the JSONParams spec?

christopheleroy commented 6 years ago

well, hello -- I wish this would notify me when you post back - maybe you should @-me. So pcol, yes, would be an expected "column name" in the DATA objects of the JSONParams. A virtual column name. The actual column name is "PROTOCOL_NUMBER", the JSON object feature (key) is 'pcol'. @varontron

christopheleroy commented 6 years ago

Here is a new example: Logically, it relies on the 3 same parameters to be passed with the exact same value.

UPDATE site_update SS SET (SAVE_emp521, save_ts) = ( SELECT CASE WHEN save_ts IS NOT NULL THEN save_emp521 WHEN effdt_mod_time > mod_time THEN effdt_emp521 ELSE MOD_EMP521 END AS mod_emp521, CASE WHEN save_ts IS NOT NULL THEN save_ts WHEN effdt_mod_time > mod_time THEN effdt_mod_time ELSE MOD_time END AS mod_time FROM ( SELECT u.save_emp521, u.SAVE_TS, u.mod_emp521 AS effdt_emp521, u.MOD_TIME AS effdt_mod_time, r.MOD_EMP521, r.MOD_TIME FROM STAGE_SITE_DATA r JOIN site_update u ON (u.site_update_id=r.site_update_id) WHERE u.site_update_id = ?n AND (drug_name, strength, strength_unit, container_amt, container_unit, pcn_number, batch, expiry_date, quantity, inv_status, row_comments) not IN ( SELECT drug_name, strength, strength_unit, container_amt, container_unit, pcn_number, batch, expiry_date, quantity, inv_status, row_comments FROM SITE_INVENTORY WHERE supd_commit_id IN (SELECT supd_commit_id FROM SITE_UPDATE_COMMIT WHERE is_Active=1 AND (study_id,site_code) IN (SELECT study_id,site_code FROM site_update WHERE site_update_id=?n))
) ORDER BY r.mod_time DESC ) WHERE ROWNUM <2 ) WHERE site_update_id=?n

If I allow this query to be called with a bug where the 3 parameters have different values, I kind-of compromised my pseudo-audit-trail.

would be much nicer to be able to write:

UPDATE site_update SS SET (SAVE_emp521, save_ts) = ( SELECT CASE WHEN save_ts IS NOT NULL THEN save_emp521 WHEN effdt_mod_time > mod_time THEN effdt_emp521 ELSE MOD_EMP521 END AS mod_emp521, CASE WHEN save_ts IS NOT NULL THEN save_ts WHEN effdt_mod_time > mod_time THEN effdt_mod_time ELSE MOD_time END AS mod_time FROM ( SELECT u.save_emp521, u.SAVE_TS, u.mod_emp521 AS effdt_emp521, u.MOD_TIME AS effdt_mod_time, r.MOD_EMP521, r.MOD_TIME FROM STAGE_SITE_DATA r JOIN site_update u ON (u.site_update_id=r.site_update_id) WHERE u.site_update_id = ?n:draft_id AND (drug_name, strength, strength_unit, container_amt, container_unit, pcn_number, batch, expiry_date, quantity, inv_status, row_comments) not IN ( SELECT drug_name, strength, strength_unit, container_amt, container_unit, pcn_number, batch, expiry_date, quantity, inv_status, row_comments FROM SITE_INVENTORY WHERE supd_commit_id IN (SELECT supd_commit_id FROM SITE_UPDATE_COMMIT WHERE is_Active=1 AND (study_id,site_code) IN (SELECT study_id,site_code FROM site_update WHERE site_update_id=?n:draft_id))
) ORDER BY r.mod_time DESC ) WHERE ROWNUM <2 ) WHERE site_update_id=?n:draft_id

christopheleroy commented 6 years ago

ALSO: I'm not saying you should try to do this ASAP. I'd like to have you green light that this is a feature that is worth pursuing, and then I'd work on it and send you a pull request...