Here's are a few examples from the public portal of clinical data which is declared type=numeric but has non-numerical values
You can obtain this list with a query like this. There is also clinical_sample table to look through. We need to figure out:
What are the documented rules around these kind of values
Does legacy code actually handle these properly?
Is there any way to handle these performantly in Clickhouse since we're going to have to handle at least some of these cases
SELECT DISTINCT ATTR_VALUE FROM clinical_patient targ
JOIN clinical_attribute_meta cam on targ.ATTR_ID = cam.ATTR_ID
WHERE DATATYPE = 'number'
AND NOT REGEXP_LIKE(ATTR_VALUE, '^-?[0-9.]+$')
UNION
SELECT DISTINCT ATTR_VALUE FROM clinical_sample targ
JOIN clinical_attribute_meta cam on targ.ATTR_ID = cam.ATTR_ID
WHERE DATATYPE = 'number'
AND NOT REGEXP_LIKE(ATTR_VALUE, '^-?[0-9.]+$')
@uklineale @haynescd
Here's are a few examples from the public portal of clinical data which is declared type=numeric but has non-numerical values
You can obtain this list with a query like this. There is also clinical_sample table to look through. We need to figure out: