cldf / csvw

CSV on the web
Apache License 2.0
36 stars 6 forks source link

Validating Case Sensitivity and Value Combinations in JSON Schema #77

Closed megin1989 closed 3 months ago

megin1989 commented 3 months ago

I'm working with a JSON schema and need to validate certain constraints. Specifically, I have the following requirements:

Case Sensitivity:

Ensure that ENCOUNTER_CLASS_CODE values are case-sensitive. Ensure that the values in ENCOUNTER_CLASS_CODE and ENCOUNTER_CLASS_CODE_DESCRIPTION are case-sensitive. Combination of Values:

for this, I have added the steps below. Is this correct? "datatype": { "base": "string", "format": "^[A-Z]+$" },

Check combinations of values in the screening table against a reference table. Ensure that the combination of ENCOUNTER_CLASS_CODE and ENCOUNTER_CLASS_CODE_DESCRIPTION exists in the reference table (e.g., AMB - ambulatory).

{ "@context": "http://www.w3.org/ns/csvw", "tables": [ { "url": "data/ENCOUNTER_CLASS_REFERENCE.csv", "tableSchema": { "columns": [ { "name": "code", "titles": "Code", "datatype": { "base": "string", "format": "^[A-Z]+$" }, "required": true }, { "name": "description", "titles": "Description", "datatype": "string", "required": true }, { "name": "system", "titles": "System", "datatype": "string", "required": true } ], "primaryKey": ["code"] } }, { "url": "data/SCREENING_qcs-test-20240603-testcase4.csv", "tableSchema": { "columns": [ { "name": "PAT_MRN_ID", "titles": "PAT_MRN_ID", "datatype": "string", "required": true, "constraints": { "unique": true } }, { "name": "FACILITY_ID", "titles": "FACILITY_ID", "datatype": "string", "required": true, "constraints": { "unique": true } }, { "name": "ENCOUNTER_ID", "titles": "ENCOUNTER_ID", "datatype": "string", "constraints": { "unique": true } }, { "name": "ENCOUNTER_CLASS_CODE", "titles": "ENCOUNTER_CLASS_CODE", "datatype": { "base": "string", "format": "^[A-Z]+$" }, "required": true }, { "name": "ENCOUNTER_CLASS_CODE_DESCRIPTION", "titles": "ENCOUNTER_CLASS_CODE_DESCRIPTION", "datatype": "string" }, { "name": "ENCOUNTER_CLASS_CODE_SYSTEM", "titles": "ENCOUNTER_CLASS_CODE_SYSTEM", "datatype": "string", "required": true }, { "name": "ENCOUNTER_STATUS_CODE", "titles": "ENCOUNTER_STATUS_CODE", "datatype": "string", "required": true }, { "name": "ENCOUNTER_STATUS_CODE_DESCRIPTION", "titles": "ENCOUNTER_STATUS_CODE_DESCRIPTION", "datatype": "string" }, { "name": "ENCOUNTER_STATUS_CODE_SYSTEM", "titles": "ENCOUNTER_STATUS_CODE_SYSTEM", "datatype": "string", "required": true, "enum": ["http://hl7.org/fhir/encounter-status"] }, { "name": "ENCOUNTER_TYPE_CODE", "titles": "ENCOUNTER_TYPE_CODE", "datatype": "string" }, { "name": "ENCOUNTER_TYPE_CODE_DESCRIPTION", "titles": "ENCOUNTER_TYPE_CODE_DESCRIPTION", "datatype": "string" }, { "name": "ENCOUNTER_TYPE_CODE_SYSTEM", "titles": "ENCOUNTER_TYPE_CODE_SYSTEM", "datatype": "string", "enum": ["SNOMED-CT", "SNOMED", "http://snomed.info/sct"] }, { "name": "SCREENING_STATUS_CODE", "titles": "SCREENING_STATUS_CODE", "datatype": "string", "required": true }, { "name": "SCREENING_STATUS_CODE_DESCRIPTION", "titles": "SCREENING_STATUS_CODE_DESCRIPTION", "datatype": "string" }, { "name": "SCREENING_STATUS_CODE_SYSTEM", "titles": "SCREENING_STATUS_CODE_SYSTEM", "datatype": "string", "required": true, "enum": ["http://hl7.org/fhir/observation-status"] }, { "name": "SCREENING_CODE", "titles": "SCREENING_CODE", "datatype": "string", "required": true, "enum": ["96777-8", "97023-6"] }, { "name": "SCREENING_CODE_DESCRIPTION", "titles": "SCREENING_CODE_DESCRIPTION", "datatype": "string", "required": true, "enum": [ "Accountable health communities (AHC) health-related social needs (HRSN) supplemental questions", "accountable health communities (AHC) health-related social needs (HRSN) supplemental questions", "Accountable health communities (AHC) health-related social needs screening (HRSN) tool", "accountable health communities (AHC) health-related social needs screening (HRSN) tool", "NYS AHC HRSN screening" ] }, { "name": "SCREENING_CODE_SYSTEM_NAME", "titles": "SCREENING_CODE_SYSTEM_NAME", "datatype": "string", "required": true, "enum": ["LN", "LOINC", "http://loinc.org", "NYS standard", "NYS Standard"] }, { "name": "RECORDED_TIME", "titles": "RECORDED_TIME", "datatype": "datetime", "required": true, "pattern": "([0-9]{4})-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])T([01][0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9]|60)(\.[0-9]+)?(Z|(\+|-)([01][0-9]|2[0-3]):([0-5][0-9]))" }, { "name": "QUESTION_CODE", "titles": "QUESTION_CODE", "datatype": "string", "required": true }, { "name": "QUESTION_CODE_DESCRIPTION", "titles": "QUESTION_CODE_DESCRIPTION", "datatype": "string", "required": true }, { "name": "QUESTION_CODE_SYSTEM_NAME", "titles": "QUESTION_CODE_SYSTEM_NAME", "datatype": "string", "required": true, "enum": ["LN", "LOINC", "http://loinc.org"] }, { "name": "UCUM_UNITS", "titles": "UCUM_UNITS", "datatype": "string" }, { "name": "SDOH_DOMAIN", "titles": "SDOH_DOMAIN", "datatype": "string", "required": true }, { "name": "PARENT_QUESTION_CODE", "titles": "PARENT_QUESTION_CODE", "datatype": "string" }, { "name": "ANSWER_CODE", "titles": "ANSWER_CODE", "datatype": "string", "required": true }, { "name": "ANSWER_CODE_DESCRIPTION", "titles": "ANSWER_CODE_DESCRIPTION", "datatype": "string", "required": true }, { "name": "ANSWER_CODE_SYSTEM_NAME", "titles": "ANSWER_CODE_SYSTEM_NAME", "datatype": "string", "required": true, "enum": ["LN", "LOINC", "http://loinc.org"] }, { "name": "POTENTIAL_NEED_INDICATED", "titles": "POTENTIAL_NEED_INDICATED", "datatype": "string", "required": true, "enum": ["Yes", "No", "NA", "yes", "no", "na"] } ], "foreignKeys": [ { "columnReference": ["PAT_MRN_ID"], "reference": { "resource": "QE_ADMIN_DATA_qcs-test-20240603-testcase4.csv", "columnReference": ["PAT_MRN_ID"] } }, { "columnReference": ["ENCOUNTER_CLASS_CODE"], "reference": { "resource": "data/ENCOUNTER_CLASS_REFERENCE.csv", "columnReference": ["code"] } }, { "columnReference": ["ENCOUNTER_CLASS_CODE_DESCRIPTION"], "reference": { "resource": "data/ENCOUNTER_CLASS_REFERENCE.csv", "columnReference": ["description"] } } ] }, "dialect": { "delimiter": "|" } } ] }

xrotwang commented 3 months ago

Seems ok - although I don't really know what "must be case sensitive" means. Foreign keys of datatype string are always treated as being case sensitive.

megin1989 commented 3 months ago

Case sensitive- e.g., AMB - ambulatory should be valid regardless of the case, like amb, Amb, AMb)

One more Check combinations of values in the screening table against a reference table. Ensure that the combination of ENCOUNTER_CLASS_CODE and ENCOUNTER_CLASS_CODE_DESCRIPTION exists in the reference table (e.g., AMB - ambulatory).

xrotwang commented 3 months ago

Case sensitive- e.g., AMB - ambulatory should be valid regardless of the case, like amb, Amb, AMb)

That would be case insensitive, I think. And no, there is no datatype "case insensitive string" in CSVW.

xrotwang commented 3 months ago

Ensure that the combination of ENCOUNTER_CLASS_CODE and ENCOUNTER_CLASS_CODE_DESCRIPTION exists in the reference table (e.g., AMB - ambulatory).

As far as I understand, ENCOUNTER_CLASS_CODE is a foreign key into the reference table. So as long as this key is valid, one could just look up the normalized ENCOUNTER_CLASS_CODE_DESCRIPTION from the reference table.