cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
91 stars 23 forks source link

Multivalued value formatting changes after validation #375

Closed pkalita-lbl closed 1 year ago

pkalita-lbl commented 1 year ago

Steps to reproduce:

  1. Load the development interface with the CanCOGeN Covid-19 template. Scroll over to the "anatomical part" column.
  2. Using the multivalued picker, choose a few values, and click "Ok" in the modal.
  3. Note at this point the value is formatted in the cell as the values you chose separated by ;, for example "Eye; Intestine".
  4. Click the "Validate" button.
  5. Note that now the value in the cell is formatted as delimited by ; (no space), for example "Eye;Intestine"

This causes issues because multivalued values don't survive a cycle of enter data -> validate -> export as data objects -> load data objects.

From a technical point of view there are a number of places where we do a split or join on multivalued values, but they often differ just a bit in how they treat whitespace. I'll put together a PR to introduce utility methods to normalize parsing and formatting multivalued values.

ddooley commented 1 year ago

So it looks like we should always strip space around all multi-valued entries? Does it make sense to silently clean (normalize) multivalued fields on load, or just during validation?

pkalita-lbl commented 1 year ago

I think using the same logic on loading and validation is important. See PR for my suggested fix.