Analyticsphere / bq2

SQL query development for Connect for Cancer Preventions' BQ2 database.
https://analyticsphere.github.io/bq2/
0 stars 1 forks source link

Select-all-that-apply fix: Convert 0 and 1 to 104430631 and 353358909 #5

Open jacobmpeters opened 8 months ago

jacobmpeters commented 8 months ago

Description:

Responses to select-all-that-apply questions in the Quest surveys are given as arrays:

{questionCID: [responseCID1, responseCID2, responseCIDN]}

They are then flattened as:

{questionCID_responseCID1: "0", 
 questionCID_responseCID2: "1", 
 questionCID_responseCIDN: "1"}

where "0" and "1" represent "no" and "yes", respectively.

However, the data dictionary prescribes that "no" and "yes" should be represented by the CIDs 104430631 and 353358909, respectively.

Action Items:

jacobmpeters commented 8 months ago

User-defined GoogleSQL Function: https://github.com/Analyticsphere/bq2/blob/0af3e0fee4e5fcc1be14e25f8931a308aa53323a/custom_sql_functions/yes_or_no.sql#L9-L14

Usage:

SELECT 
   Connect_ID,
   yes_or_no(var) as var
FROM 
   FlatConnect.table
jacobmpeters commented 8 months ago

Notes from Nicole:

jacobmpeters commented 8 months ago

Adding this note after a discussion with @jeannewu:

Note: If we were to fix this directly in the flattening scheduled queries, we would simply need to change 1/0 to be 353358909/104430631 in the following function in each query:

https://github.com/Analyticsphere/flatteningRequests/blob/288dc5dc3ead45254fe2f82f6ca128285f7e76b7/gcp_query_tools/generate_flattening_query.R#L122-L138

  function handleRowJS(row) {
    for (let arrPath of Object.keys(arraysToBeFlattened)) {
      let currObj = {};
      let inputConceptIdList = getNestedObjectValue(row, arrPath);
      if (!inputConceptIdList || inputConceptIdList.length === 0) continue;
      inputConceptIdList = inputConceptIdList.map(v => +v);
      for (let cid of arraysToBeFlattened[arrPath]) {
        if (inputConceptIdList.indexOf(cid) >= 0) {
          currObj["%s" + cid] = 1;
        } else currObj["%s" + cid] = 0;
      }
      setNestedObjectValue(row, arrPath, currObj);
    }
    return JSON.stringify(row);
  }
  const row = JSON.parse(input_row);
  return handleRowJS(row);

If we did this, it would cause many changes to the data structure in BQ1 modules that would disrupt Kelsey's reports. We are choosing to leave it as is in BQ1 (for now) and resolve it with SQL for BQ2.

jeannewu commented 8 months ago

Thanks, Jake.