cidgoh / pathogen-genomics-package

This is the DataHarmonizer spreadsheet web application bundled with pathogen genomics data entry and validation templates
MIT License
6 stars 4 forks source link

Canadian MPX template: age bin data overwritten when a saved file if re-opened in the DH #1

Closed griffie closed 2 years ago

griffie commented 2 years ago

Currently in the Canadian MPX template a user can have a dataset in which the age value is a null value, but they have entered an age bin instead (so that the exact age is obfuscated but the general age range is shared). There is code in the DH that says "if there is a null value in the age field, put the same null value in the age bin field", which was created in order to automate populating associated fields and reduce data entry. If the user saves the dataset and re-opens it later, then the entered/saved age bin data gets overwritten with the same null value as in the age field.

This is erasing information that the user has entered. And we had the same issue in the CanCOGeN template.

You put a fix in place in the CanCOGeN template so that upon opening a file, whatever is in the age bin field will remain untouched. BUT if a user is entering fresh data, if they enter a null value for the age field, the age bin field still autofills the same null value (which they can edit themselves if they want).

Can you put the same fix into the MPX template that you put in for the CanCOGeN template to address this issue?

Thanks!

ddooley commented 2 years ago

I'm looking into this now. One thing though - there was no special coding for CanCOGeN in this respect, so regarding bins behaviour at moment is same for both CanCOGen and MPX, which is activated for any set of columns named [field], [field + "_unit"], [field + "_bin"] .

griffie commented 2 years ago

Hmm, I seem to remember we came up with a fix for this before but now I'm testing the latest version and an older version, I see the same overwrite in the CanCOGeN template. So we would need to fix this in both templates (or all templates with that behaviour).

Can we not just have it that the rule(s) that carry over the age null values into the bin field do not apply upon opening an existing file? So no automations (across the board) should occur on an opened file, only when a user creates/enters new data or creates a new file?

ddooley commented 2 years ago

I suspect the upgrade to LinkML accidentally toasted it. That aside, we have 2 mechanisms: on-load cleanup, and just-in-time cleanup as user is changing value in a field. I've figured out the problem that the on-load cleanup wasn't reading existing values into its rules. I'll change that, deliver it to pathogen-genomics-package master and we can make a decision about whether to do it on load.

We need the bulk column value normalization function somewhere in UI if not on load. An alternative is to do it on validation but that could affect performance.

ddooley commented 2 years ago

The fix is in a pull request, and as well a new load json function. Just awaiting some input from NMDC folks, and can probably release this morning.