cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
91 stars 23 forks source link

Add support for unique_keys field in LinkML classes #371

Closed pkalita-lbl closed 1 year ago

pkalita-lbl commented 1 year ago

These changes add support for the unique_keys slot on ClassDefinition in the LinkML metamodel during validation. The core logic is implemented in a new validateUniqueValues function in validation.js with corresponding unit tests in validation.test.js. This utility function is also now being used to validate the uniqueness of identifier columns.

Fixes #369

pkalita-lbl commented 1 year ago

cc: @turbomam

turbomam commented 1 year ago

cc: @turbomam

Great. Is there anything I can do to review, since I'm not a JS developer? I'm not even sure what the tests do.

ddooley commented 1 year ago

Ok, I'll have a chance to review tomorrow! Thanks for getting this going!

ddooley commented 1 year ago

So I can't quite follow the algorithm (my brain limitation). I see there's a check for duplicate values within a column. I can't quite see how duplicate row keycolumn values are checked for. An index of each keycolumn's value -> rows it appears in is made? Then in 2nd pass each row's key combo is checked somehow?

The alternative would be to create an index composed of each row's a single key string as a tab-delimited merged string of the keycolumn values present in that row? Then as each new row is being processed, collisions with past row can be detected in one pass, and error info for colliding cells that are part of the collision can be marked?

ddooley commented 1 year ago

Functionality and performance look great!

ddooley commented 1 year ago

All set to merge?

pkalita-lbl commented 1 year ago

Sorry for not saying so yesterday, but yes, I was ready to merge 😁 Thanks