cldf / csvw

CSV on the web
Apache License 2.0
37 stars 6 forks source link

How to Enforce Unique Constraints in a JSON Schema for Patient and Facility Data #76

Closed megin1989 closed 4 months ago

megin1989 commented 4 months ago

I've added a JSON schema below and need to integrate two rules:

  1. PAT_MRN_ID must have a unique value per patient per facility.
  2. FACILITY_ADDRESS1 must be unique per facility.

To achieve this, I've added "primaryKey": ["PAT_MRN_ID", "FACILITY_ID"] and set FACILITY_ADDRESS1 to be unique. Is there anything else we can do in the JSON schema? Please help.

{ "@context": "http://www.w3.org/ns/csvw", "tables": [ { "url": "data/QE_ADMIN_DATA_qcs-test-20240603-testcase4.csv", "tableSchema": { "columns": [ { "name": "PAT_MRN_ID", "titles": "PAT_MRN_ID", "datatype": "string", "required": true, "constraints": { "unique": true } }, { "name": "FACILITY_ID", "titles": "FACILITY_ID", "datatype": "string", "required": true, "constraints": { "unique": true } }, { "name": "FACILITY_LONG_NAME", "titles": "FACILITY_LONG_NAME", "datatype": "string", "required": true, "pattern": "^[a-zA-Z\s]+$" }, { "name": "ORGANIZATION_TYPE", "titles": "ORGANIZATION_TYPE", "datatype": "string", "required": true, "enum": [ "prov", "dept", "team", "govt", "ins", "pay", "edu", "reli", "crs", "cg", "bus", "other", "laboratory", "imaging", "pharmacy", "health-information-network", "health-data-aggregator", "Hospital", "DTC", "SNF", "SCN", "CBO", "OMH", "OASAS", "Practice", "Article 36", "Article 40", "MCO" ] }, { "name": "FACILITY_ADDRESS1", "titles": "FACILITY_ADDRESS1", "datatype": "string", "required": true, "constraints": { "pattern": "^(?!\sP\.?O\.?\sBox\s*\d+)[a-zA-Z0-9\s]+$", "unique": true } }, { "name": "FACILITY_ADDRESS2", "titles": "FACILITY_ADDRESS2", "datatype": "string" }, { "name": "FACILITY_CITY", "titles": "FACILITY_CITY", "datatype": "string", "required": false, "constraints": { "enum": ["New York City", "Albany", "Buffalo", "Rochester", "Yonkers"] // Update with actual cities in New York State } }, { "name": "FACILITY_STATE", "titles": "FACILITY_STATE", "datatype": "string", "enum": ["New York", "NY"] }, { "name": "FACILITY_ZIP", "titles": "FACILITY_ZIP", "datatype": "string", "required": true, "pattern": "^\d{5}(-\d{4})?$" }, { "name": "VISIT_PART_2_FLAG", "titles": "VISIT_PART_2_FLAG", "datatype": "string", "required": true, "enum": ["Yes", "No"] }, { "name": "VISIT_OMH_FLAG", "titles": "VISIT_OMH_FLAG", "datatype": "string", "required": true, "enum": ["Yes", "No"] }, { "name": "VISIT_OPWDD_FLAG", "titles": "VISIT_OPWDD_FLAG", "datatype": "string", "required": true, "enum": ["Yes", "No"] } ], "primaryKey": ["PAT_MRN_ID", "FACILITY_ID"] }, "dialect": { "delimiter": "|" } } ] }

xrotwang commented 4 months ago

Unfortunately, there is no unique property defined for columns in CSVW. (Also no enum or constraint properties.) It looks like you are trying to use DataPackage's TableSchema syntax here. But CSVW is a different specification. The csvw package only deals with the latter (apart from some basic support for bootstrapping CSVW from DP).

megin1989 commented 4 months ago

Sorry, I get a note from my friend like this. CSVW (CSV on the Web) metadata provides a way to describe the structure and constraints of CSV files. It has some support for constraints such as unique, enum, and other validation mechanisms.

is this correct? Please explain. We need to apply our various business rules here.

xrotwang commented 4 months ago

Here's the spec: https://www.w3.org/TR/tabular-metadata/ and here's some description of our csvw package's (lack of) support for CSVW features: https://github.com/cldf/csvw#known-limitations