A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
97
stars
27
forks
source link
CLI DataHarmonizer dh-validate.py script for validating tsv, csv, xls, and xlsx content files #443
This takes care of all the validation issues that show up when trying to validate DataHarmonizer-produced tabular data (in tsv, csv, xls, and xlsx formats) using linkml-validate cli. The strategy involves creating a temporary .yaml output file with needed changes to make linkml-validate work well on it. Namely:
The "section label" row of a DH data file is stripped off (as well as any other rows leading up to row that defines column labels.
Column headers, which are often presented using their LinkML slot titles instead of their slot names, are renamed to slot names so that linkml-validate works; otherwise it errors out about not finding slots.
multiselect slot values are converted intto an array of values.
decimals, floats, and integers are ensured to be numeric fields.
A copy of the transformed file is saved in .yaml; a header transformed copy is saved as well if needed.
We haven't tested it on .json / .yaml / .yml input but that may work.
This takes care of all the validation issues that show up when trying to validate DataHarmonizer-produced tabular data (in tsv, csv, xls, and xlsx formats) using linkml-validate cli. The strategy involves creating a temporary .yaml output file with needed changes to make linkml-validate work well on it. Namely:
We haven't tested it on .json / .yaml / .yml input but that may work.