Open jacobmpeters opened 10 months ago
I moved this one to a separate issue since it is slightly different from the one in #4.
When we confirm with @FrogGirl1123 that these should be partitioned, we can use the following SQL to do so:
SELECT
-- Variable with mix of ages (e.g., 55) and years (e.g., 1987)
D_317093647,
-- Case when its an age
CASE
WHEN REGEXP_CONTAINS(D_317093647, r'^\d{1,2}$')
AND CAST(D_317093647 AS INT64) BETWEEN 0 AND 99
THEN CAST(D_317093647 AS INT64)
ELSE
NULL
END AS D_317093647_D_206625031,
-- Case when its a year
CASE
WHEN REGEXP_CONTAINS(D_317093647, r'^\d{4}$')
THEN CAST(D_317093647 AS INT64)
ELSE
NULL
END AS D_317093647_D_261863326
FROM
`nih-nci-dceg-connect-prod-6d04.FlatConnect.merged_module1`
WHERE
D_317093647 IS NOT NULL
This assumes that ages are between 0-99. It will need to be tweaked if this is not the case. Some additional checks for years might also be appropriate here..
@jacobmpeters Thanks a lot. You have made such a nice patch on his issue of this variable I've just thought about. But I think the variables as 623218391 & 802622485 might be more updated than this pair of 206625031 and 261863326. How is your opinion?
@jeannewu Good catch! I agree. Thanks for looking it over so quickly.
I'll update those below:
SELECT
-- Variable with mix of ages (e.g., 55) and years (e.g., 1987)
D_317093647,
-- Case when its an age
CASE
WHEN REGEXP_CONTAINS(D_317093647, r'^\d{1,2}$')
AND CAST(D_317093647 AS INT64) BETWEEN 0 AND 99
THEN CAST(D_317093647 AS INT64)
ELSE
NULL
END AS D_317093647_D_623218391,
-- Case when its a year
CASE
WHEN REGEXP_CONTAINS(D_317093647, r'^\d{4}$')
THEN CAST(D_317093647 AS INT64)
ELSE
NULL
END AS D_317093647_D_802622485
FROM
`nih-nci-dceg-connect-prod-6d04.FlatConnect.merged_module1`
WHERE
D_317093647 IS NOT NULL
@jacobmpeters Just talked with Kelsey on this variable, Kelsey said she would let Tony check and fix this variable in Firestore. I also checked in the Connect.module1_v2: there are more entries under this variable recently. We need let Tony know this issue and stop and fix it at the beginning. And we can fix the current ones for our BQ2 if it is not too much burden. How is your opinion?
Thanks for the update @jeannewu.
It would be great if Tony can fix it so that this issue doesn't occur for new data. But, unless he also plans to correct the old data in Firestore, I think we will still need to correct this for BQ2. Can you or @KELSEYDOWLING7 update this issue after the conversation with Tony?
Notes from Nicole:
From Jing @jeannewu in MS Teams:
Originally posted by @jacobmpeters in https://github.com/Analyticsphere/bq2/issues/4#issuecomment-1870534028
To view problem: