cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
92 stars 25 forks source link

Make header check more robust #196

Closed mgopez closed 3 years ago

mgopez commented 3 years ago

Issue:

Fix:

ddooley commented 3 years ago

Hi, thanks for contributing this. I will have a close look at it today. It looks like it will improve matching without compromising any existing functionality.

ddooley commented 3 years ago

@hellothisisMatt I approved this, then had to revert it on further testing.

On the surface this seems to avoid miss-match on fields with extra spaces in labels. But the algorithm for aligning data columns to template still must run - and it currently depends on exact string match. See main.js mapMatrixToGrid function unmappedHeaders code. So that code needs to be changed too, otherwise each space-mismatched header column of info won't be uploaded. I think that is the only other point in code that pays attention to uploaded file's exact field names, but I'm not sure.

A less UI friendly approach would be just to issue a report/warning of fields that differ only by spaces, and have user manually change this in file before reuploading. But this isn't as nice as changing mapMatrixToGrid().

mgopez commented 3 years ago

Hey @ddooley, thanks for the reply. I can have a look at making unmappedHeaders required changes as well.