cldf / csvw

CSV on the web
Apache License 2.0
36 stars 5 forks source link

How to Enforce Unique Constraints in a JSON Schema for Patient and Facility Data #76

Open megin1989 opened 3 days ago

megin1989 commented 3 days ago

I've added a JSON schema below and need to integrate two rules:

  1. PAT_MRN_ID must have a unique value per patient per facility.
  2. FACILITY_ADDRESS1 must be unique per facility.

To achieve this, I've added "primaryKey": ["PAT_MRN_ID", "FACILITY_ID"] and set FACILITY_ADDRESS1 to be unique. Is there anything else we can do in the JSON schema? Please help.

{ "@context": "http://www.w3.org/ns/csvw", "tables": [ { "url": "data/QE_ADMIN_DATA_qcs-test-20240603-testcase4.csv", "tableSchema": { "columns": [ { "name": "PAT_MRN_ID", "titles": "PAT_MRN_ID", "datatype": "string", "required": true, "constraints": { "unique": true } }, { "name": "FACILITY_ID", "titles": "FACILITY_ID", "datatype": "string", "required": true, "constraints": { "unique": true } }, { "name": "FACILITY_LONG_NAME", "titles": "FACILITY_LONG_NAME", "datatype": "string", "required": true, "pattern": "^[a-zA-Z\s]+$" }, { "name": "ORGANIZATION_TYPE", "titles": "ORGANIZATION_TYPE", "datatype": "string", "required": true, "enum": [ "prov", "dept", "team", "govt", "ins", "pay", "edu", "reli", "crs", "cg", "bus", "other", "laboratory", "imaging", "pharmacy", "health-information-network", "health-data-aggregator", "Hospital", "DTC", "SNF", "SCN", "CBO", "OMH", "OASAS", "Practice", "Article 36", "Article 40", "MCO" ] }, { "name": "FACILITY_ADDRESS1", "titles": "FACILITY_ADDRESS1", "datatype": "string", "required": true, "constraints": { "pattern": "^(?!\sP\.?O\.?\sBox\s*\d+)[a-zA-Z0-9\s]+$", "unique": true } }, { "name": "FACILITY_ADDRESS2", "titles": "FACILITY_ADDRESS2", "datatype": "string" }, { "name": "FACILITY_CITY", "titles": "FACILITY_CITY", "datatype": "string", "required": false, "constraints": { "enum": ["New York City", "Albany", "Buffalo", "Rochester", "Yonkers"] // Update with actual cities in New York State } }, { "name": "FACILITY_STATE", "titles": "FACILITY_STATE", "datatype": "string", "enum": ["New York", "NY"] }, { "name": "FACILITY_ZIP", "titles": "FACILITY_ZIP", "datatype": "string", "required": true, "pattern": "^\d{5}(-\d{4})?$" }, { "name": "VISIT_PART_2_FLAG", "titles": "VISIT_PART_2_FLAG", "datatype": "string", "required": true, "enum": ["Yes", "No"] }, { "name": "VISIT_OMH_FLAG", "titles": "VISIT_OMH_FLAG", "datatype": "string", "required": true, "enum": ["Yes", "No"] }, { "name": "VISIT_OPWDD_FLAG", "titles": "VISIT_OPWDD_FLAG", "datatype": "string", "required": true, "enum": ["Yes", "No"] } ], "primaryKey": ["PAT_MRN_ID", "FACILITY_ID"] }, "dialect": { "delimiter": "|" } } ] }

xrotwang commented 3 days ago

Unfortunately, there is no unique property defined for columns in CSVW. (Also no enum or constraint properties.) It looks like you are trying to use DataPackage's TableSchema syntax here. But CSVW is a different specification. The csvw package only deals with the latter (apart from some basic support for bootstrapping CSVW from DP).