cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
90 stars 23 forks source link

Conditional select menus (pulldowns, LinkML enums, etc) #302

Open turbomam opened 2 years ago

turbomam commented 2 years ago

NMDC use case:

In addition to the MIxS environmental triad of EnvO terms, NMDC also describes the origin of biosamples with 5-part GOLD paths.

Interactive GOLD paths explorer: https://gold.jgi.doe.gov/ecosystemtree?mode=organism

Journal article: https://www.researchgate.net/publication/328539814_Genomes_OnLine_database_GOLD_v7_updates_and_new_features

There are thousands of valid combinations of the five GOLD path ranks, so we don't want to build one pre-composed pulldowns. Therefore we have one pulldown for each rank. But you can't mix the values of the five ranks indiscriminately. They have to form valid paths, like

  1. Ecosystem Category → Terrestrial
  2. Ecosystem → Environmental
  3. Specific Ecosystem → Agricultural land
  4. Ecosystem Type → Soil
  5. Ecosystem Subtype → Clay

We would like a selection from Ecosystem Category to constrain the values available in the columns for Ecosystem and lower ranked

turbomam commented 2 years ago

See https://deploy-preview-101--voluble-pika-79eed4.netlify.app/linkml.html?template=nmdc_dh/soil_emsl_jgi_mg

ddooley commented 2 years ago

So I see this table in the Gold paper that summarizes the dependency:

image

I guess the first challenge is how to wind these relationships into the LinkML enums? Has that been done yet? A particular enum like "composting" (an ecosystem type enum) could have a "depends on" or "pertinent to" relation pointing to Ecosystem category "solid waste"? If that were coded in then DH would have the data to accomplish both validation and menu filtering.

turbomam commented 2 years ago

I guess the first challenge is how to wind these relationships into the LinkML enums? Has that been done yet?

No, not yet. That's on me.

turbomam commented 2 years ago

Patrick found a GOLD's 5-Level Ecosystem Classification Paths Excel spreadsheet at the bottom of this page

ddooley commented 2 years ago

So its a polyhierarchy - and terms like "Biofilm" can occur at several levels too. This argues for having each enumeration's items have a 'pertinent to' relation to one or more other enumeration items. Is that doable? Seems like it should be baked into LinkML schema?

pkalita-lbl commented 2 years ago

Just a quick update, I had some success yesterday at getting the autocomplete dropdown menus populating with the correct GOLD path elements. I've pushed that work up in this branch. That work is mainly focused the Handsontable mechanics. The GOLD ecosystem path rules are not coming from the LinkML schema, yet, but rather an external JSON file (well, technically JS file because of how DataHarmonizer loads resources). I am investigating how the GOLD ecosystem path spec could be translated to LinkML, but not much to report there yet.

ddooley commented 7 months ago

How is the need for this functionality in DH looking now? It seems like validation rules wouldn't handle this; instead dynamic filtering is required.

pkalita-lbl commented 7 months ago

It has been quite a while since I worked on this in NMDC, but as far as I recall we were able to do what we needed to do with some custom code on the NMDC side. I'm not sure there's anything else here that DH needs to solve.