Closed megin1989 closed 3 months ago
Your foreignKeys
property seems to be ill-specified. It should look like
"foreignKeys": [{
"columnReference": ["PAT_MRN_ID"],
"reference": {
"resource": "QE_ADMIN_DATA_qcs-test-20240603-testcase4.csv",
"columnReference": ["PAT_MRN_ID"]
}
}]
The error you see complains about the lack of the nested reference
property.
{
"url": "data/SCREENING_qcs-test-20240603-testcase4.csv",
"tableSchema": {
"columns": [
{"name": "PAT_MRN_ID", "titles": "PAT_MRN_ID", "datatype": "string", "required": true},
{"name": "FACILITY_ID", "titles": "FACILITY_ID", "datatype": "string", "required": true},
{"name": "ENCOUNTER_ID", "titles": "ENCOUNTER_ID", "datatype": "string"},
{"name": "ENCOUNTER_CLASS_CODE", "titles": "ENCOUNTER_CLASS_CODE", "datatype": "string", "required": true},
{"name": "ENCOUNTER_CLASS_CODE_DESCRIPTION", "titles": "ENCOUNTER_CLASS_CODE_DESCRIPTION", "datatype": "string"},
{"name": "ENCOUNTER_CLASS_CODE_SYSTEM", "titles": "ENCOUNTER_CLASS_CODE_SYSTEM", "datatype": "string", "required": true},
{"name": "ENCOUNTER_STATUS_CODE", "titles": "ENCOUNTER_STATUS_CODE", "datatype": "string", "required": true},
{"name": "ENCOUNTER_STATUS_CODE_DESCRIPTION", "titles": "ENCOUNTER_STATUS_CODE_DESCRIPTION", "datatype": "string"},
{"name": "ENCOUNTER_STATUS_CODE_SYSTEM", "titles": "ENCOUNTER_STATUS_CODE_SYSTEM", "datatype": "string", "required": true},
{"name": "ENCOUNTER_TYPE_CODE", "titles": "ENCOUNTER_TYPE_CODE", "datatype": "string"},
{"name": "ENCOUNTER_TYPE_CODE_DESCRIPTION", "titles": "ENCOUNTER_TYPE_CODE_DESCRIPTION", "datatype": "string"},
{"name": "ENCOUNTER_TYPE_CODE_SYSTEM", "titles": "ENCOUNTER_TYPE_CODE_SYSTEM", "datatype": "string"},
{"name": "SCREENING_STATUS_CODE", "titles": "SCREENING_STATUS_CODE", "datatype": "string", "required": true},
{"name": "SCREENING_STATUS_CODE_DESCRIPTION", "titles": "SCREENING_STATUS_CODE_DESCRIPTION", "datatype": "string"},
{"name": "SCREENING_STATUS_CODE_SYSTEM", "titles": "SCREENING_STATUS_CODE_SYSTEM", "datatype": "string", "required": true},
{"name": "SCREENING_CODE", "titles": "SCREENING_CODE", "datatype": "string", "required": true},
{"name": "SCREENING_CODE_DESCRIPTION", "titles": "SCREENING_CODE_DESCRIPTION", "datatype": "string", "required": true},
{"name": "SCREENING_CODE_SYSTEM_NAME", "titles": "SCREENING_CODE_SYSTEM_NAME", "datatype": "string", "required": true},
{"name": "RECORDED_TIME", "titles": "RECORDED_TIME", "datatype": "datetime", "required": true},
{"name": "QUESTION_CODE", "titles": "QUESTION_CODE", "datatype": "string", "required": true},
{"name": "QUESTION_CODE_DESCRIPTION", "titles": "QUESTION_CODE_DESCRIPTION", "datatype": "string", "required": true},
{"name": "QUESTION_CODE_SYSTEM_NAME", "titles": "QUESTION_CODE_SYSTEM_NAME", "datatype": "string", "required": true},
{"name": "UCUM_UNITS", "titles": "UCUM_UNITS", "datatype": "string"},
{"name": "SDOH_DOMAIN", "titles": "SDOH_DOMAIN", "datatype": "string", "required": true},
{"name": "PARENT_QUESTION_CODE", "titles": "PARENT_QUESTION_CODE", "datatype": "string"},
{"name": "ANSWER_CODE", "titles": "ANSWER_CODE", "datatype": "string", "required": true},
{"name": "ANSWER_CODE_DESCRIPTION", "titles": "ANSWER_CODE_DESCRIPTION", "datatype": "string", "required": true},
{"name": "ANSWER_CODE_SYSTEM_NAME", "titles": "ANSWER_CODE_SYSTEM_NAME", "datatype": "string", "required": true},
{"name": "POTENTIAL_NEED_INDICATED", "titles": "POTENTIAL_NEED_INDICATED", "datatype": "string", "required": true}
],
"foreignKeys": [{
"columnReference": ["PAT_MRN_ID"],
"reference": {
"resource": "QE_ADMIN_DATA_qcs-test-20240603-testcase4.csv",
"columnReference": ["PAT_MRN_ID"]
}
}]
},
"dialect": {
"delimiter": "|"
}
}
When i update like this i got following errors
`/home/megin/.local/lib/python3.10/site-packages/csvw/metadata.py:426: UserWarning: Invalid property pattern for Column warnings.warn('Invalid property {} for {}'.format(k, type_name))
/home/megin/.local/lib/python3.10/site-packages/csvw/metadata.py:426: UserWarning: Invalid property enum for Column
warnings.warn('Invalid property {} for {}'.format(k, type_name))
Traceback (most recent call last):
File "/home/megin/workspaces/csv-sql-schema/csv-old/csvw/new.py", line 40, in
Please help me to fix.
But that error seems to come from your own code:
if not isinstance(fk.columns, list) or not isinstance(fk.reference.columnReference, list):
A ForeignKey
object has indeed no columns
attribute, but only a columnReference
and a reference
.
Thank you. I fixed the errors and got the result, but the output prints 'No validation errors found,' even though our CSV file has validation errors against the JSON file. Is there another Python code option to validate it?
import json
from csvw import TableGroup
datapackage_path = 'datapackage.json'
with open(datapackage_path, 'r') as f:
frictionless_datapackage = json.load(f)
table_group = TableGroup.from_frictionless_datapackage(frictionless_datapackage)
def validate_table_group(table_group):
errors = []
for table in table_group.tables:
table_url = table.url
table_schema = table.tableSchema
for column in table_schema.columns:
if not hasattr(column, 'name'):
errors.append(f"Column without a name in table {table_url}")
if not hasattr(column, 'datatype'):
errors.append(f"Column {getattr(column, 'name', '')} without a datatype in table {table_url}")
if hasattr(table_schema, 'primaryKey') and not isinstance(table_schema.primaryKey, list):
errors.append(f"Invalid primaryKey in table {table_url}")
if hasattr(table_schema, 'foreignKeys'):
for fk in table_schema.foreignKeys:
if not isinstance(fk.columnReference, list) or not isinstance(fk.reference.columnReference, list):
errors.append(f"Invalid foreignKey specification in table {table_url}")
return errors
validation_errors = validate_table_group(table_group)
if validation_errors:
for error in validation_errors:
print(f"Validation Error: {error}")
else:
print("No validation errors found.")
output_file = 'validation_results1.json'
with open(output_file, 'w', encoding='utf-8') as json_file:
json.dump(validation_errors, json_file, indent=4)
I also tried this code with datapackage.jsonld
(JSON-LD) and received the correct errors. Below is my Python code. Is this correct?
import csv
import json
import os
import re
from datetime import datetime
metadata_file = 'datapackage.jsonld'
with open(metadata_file, 'r', encoding='utf-8') as f:
metadata = json.load(f)
tables = metadata['tables']
def validate_csv(csv_file_path, table_schema, delimiter):
errors = []
with open(csv_file_path, 'r', encoding='utf-8') as csv_file:
reader = csv.DictReader(csv_file, delimiter=delimiter)
for row_num, row in enumerate(reader, start=1):
for column in table_schema['columns']:
col_name = column['name']
col_titles = column.get('titles', col_name)
value = row.get(col_titles)
if column.get('required') and not value:
errors.append(f"Row {row_num}: '{col_titles}' is required but missing.")
if 'enum' in column and value and value not in column['enum']:
errors.append(f"Row {row_num}: '{value}' in '{col_titles}' is not a valid value.")
if 'pattern' in column and value and not re.match(column['pattern'], value):
errors.append(f"Row {row_num}: '{value}' in '{col_titles}' does not match pattern {column['pattern']}.")
if column['datatype'] == 'datetime':
try:
datetime.strptime(value, '%Y-%m-%dT%H:%M:%S')
except ValueError:
errors.append(f"Row {row_num}: '{value}' in '{col_titles}' is not a valid datetime.")
if column['datatype'] == 'date':
try:
datetime.strptime(value, '%Y-%m-%d')
except ValueError:
errors.append(f"Row {row_num}: '{value}' in '{col_titles}' is not a valid date.")
return errors
csv_base_dir = ''
validation_results = {}
for table in tables:
url = table['url']
table_schema = table['tableSchema']
dialect = table.get('dialect', {})
delimiter = dialect.get('delimiter', ',')
csv_file_path = os.path.join(csv_base_dir, url)
if not os.path.exists(csv_file_path):
validation_results[csv_file_path] = ["File not found."]
continue
errors = validate_csv(csv_file_path, table_schema, delimiter)
if errors:
validation_results[csv_file_path] = errors
else:
validation_results[csv_file_path] = ["Valid"]
output_file = 'validation_results.json'
with open(output_file, 'w', encoding='utf-8') as json_file:
json.dump(validation_results, json_file, indent=4)
print("Validation completed.")
Well, in your first code snippet you only validate the metadata, not the data - while in the second you go through each row in the actual data file.
If you want to do this for a TableGroup
, you can copy the code from CSVW.is_valid
:
https://github.com/cldf/csvw/blob/90ac48577a659cc435a5bd442a907781317f7547/src/csvw/metadata.py#L1660-L1666
Can you please give a sample file link or full code?
In your function validate_table_group
above, add the code snippet I've given, replacing self
and self.tablegroup
with table_group
.
Yes, again I got this message "No validation errors found."
for table in table_group.tables:
for _ in table.iterdicts(strict=False):
pass
if not table.check_primary_key(): # pragma: no cover
warnings.warn('Duplicate primary key')
if not table_group.check_referential_integrity(strict=True):
warnings.warn('Referential integrity check failed')
I also tried this code with datapackage.jsonld (JSON-LD) and received the correct errors.
Since I don't have your data available, I cannot really check - or know what the "correct errors" should be. But if datapackage.jsonld
works for you, you might just stick to that?
How do we set a foreign key here? When we run our JSON, we encounter the following errors. I've attached the JSON schema and py file code below.
This is my JSON schema.
The py file code