Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/latest/cli_reference.html
MIT License
22 stars 25 forks source link

Feature request: provide way to autopopulate Valid Values from children #329

Open allaway opened 3 years ago

allaway commented 3 years ago

For example, we have attributes defined for a parent tissue with children:

medial dorsal nucleus of thalamus, posterior inferior parietal cortex, midbrain, cerebral cortex, frontal lobe, hippocampus, nerve tissue, dorsolateral prefrontal cortex, anterior cingulate cortex, frontal pole, parahippocampal gyrus, superior temporal gyrus, inferior frontal gyrus, cerebellum, occipital visual cortex, inferior temporal gyrus, middle temporal gyrus, posterior cingulate cortex, temporal pole, precentral gyrus, superior parietal lobe, prefrontal cortex, amygdala, caudate nucleus, nucleus accumbens, putamen, temporal cortex, orbitofrontal cortex, ventrolateral prefrontal cortex, medial frontal cortex, primary motor cortex, primary somatosensory cortex, posteroinferior parietal cortex, primary auditory cortex, posterior superior temporal cortex, inferolateral temporal cortex, primary visual cortex, amygdaloid complex, striatum, cerebellar cortex, serum, plasma, splenocyte, blood, primary tumor, Not Applicable, embryonic tissue, meninges, forebrain, medial orbital frontal cortex, medial prefrontal cortex, inferior temporal cortex, middle frontal gyrus, cortical plate, VZ/SVZ, dorsal pallium, bone marrow, Buccal Mucosa, Dorsal Root Ganglion, unspecified, whole brain, Buffy Coat, frontal cortex, olfactory neuroepithelium

Each of these children is defined as it's own attribute as well, with parent tissue. It would be nice to automatically define these as valid values (say, if the valid value property for tissue was left blank), rather than every time we want to define a new tissue to have to add a new attribute and add it to the list of eligible valid values.

allaway commented 3 years ago

BTW - I'd add a label for this, but don't have the appropriate permission to do that. :)

milen-sage commented 3 years ago

@allaway you should have access to labels now?

milen-sage commented 3 years ago

This is an interesting feature request since we had that functionality and deprecated it... (actually the code is even still in but we don't expose it through the csv logic).

The parent:children relationship works to autopopulate valid values the way Sage vocabulary is organized currently, but not necessarily for other types of schemas (e.g. there are often valid values that are not children of an object).

We can consider re-surfacing the functionality. But there is a different feature that we are trying to explore: auto populate valid values from source ontologies. That way we'd automatically be consistent with standard terms if we pick a standard ontology; or we could pull terms from say existing Sage controlled vocabularies.

allaway commented 3 years ago

@milen-sage i can now, thanks!

Yeah, it makes sense that you would want to explicitly define the values in some cases. I think your proposal makes sense, though a lot of the terms I'm thinking about where we have a lot of terms (eg model systems) don't have good ground truth ontologies.

milen-sage commented 3 years ago

Yes, in those cases we'd like to be able to point to "non-standard" sources (e.g. the values in Sage controlled vocabularies for a given term/key) stored in a standard format (e.g. jsonschema or json-ld). As long as we have consistent format that we can read from, we can pull from our own references.

I think that's one case where triaging would help @ychae? I.e. how important is this feature

When will use cases depending on this become relevant

E.g. if this is "Important - Short-term" we can revive functionality we had before.

allaway commented 3 years ago

From the NF perspective:

The priority is low to medium - time saver, and nice to have (but definitely less of an issue now that it only takes a few minutes to generate the json-ld after making changes to the csv). The timing - It's relevant in the short term (for our use case).

On Wed, Oct 28, 2020 at 2:54 PM milen-sage notifications@github.com wrote:

Yes, in those cases we'd like to be able to point to "non-standard" sources (e.g. the values in Sage controlled vocabularies for a given term/key) stored in a standard format (e.g. jsonschema or json-ld). As long as we have consistent format that we can read from, we can pull from our own references.

I think that's one case where triaging would help @ychae https://github.com/ychae? I.e. how important is this feature

  • "Low - it's an enhancement but not crucial for work"
  • "Medium - can do work w/o it; but important (e.g. to save time or for convenience)"
  • "Important - it's a blocker and can't do work w/o it" when will use cases depending on this become relevant
  • "Short-term - 2-4 weeks"
  • "Mid-term - 2-4 months"
  • "Long-term - 6 months - 1 year"

E.g. if this is "Important - Short-term" we can revive functionality we had before.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sage-Bionetworks/schematic/issues/329#issuecomment-718230708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3WNSBW7ICOM6J5BWLFJFLSNCHIFANCNFSM4TBKKQXA .

BrunoGrandePhD commented 3 years ago

I agree that this feature would be useful at cutting down redundancy. D.R.Y. F.T.W.!

A temporary solution until we utilize source ontologies could be a toggle to enable the existing (but currently disabled) functionality via a keyword under Validation Rules. For example, users could specify ValidValuesFromChildren under Validation Rules, and schematic would then internally populate Valid Values with a list of attributes whose parent is the given attribute.

Though, I feel like this might be oversimplifying the process. Is my thinking valid?

milen-sage commented 3 years ago

Yes, I think that should work short term @BrunoGrandePhD - could you check if range_value_relationship here

https://github.com/Sage-Bionetworks/schematic/blob/41c181fcc3e56fbc879aaecc053f357f184030d2/schematic/schemas/generator.py#L24

is set to 'parentOf', the list of valid values gets populated correctly? A lot of time has passed since I wrote the code and then I updated it, so hopefully the obsolete behavior still works as expected :) If that's the case, then the toggle you suggest would only need to set 'parentOf' in the schema/generator.py constructor range_value_relationship attribute.

If this is helpful in any way, this is the line where valid values for each node are determined (those are used to populate the manifest dropdowns): https://github.com/Sage-Bionetworks/schematic/blob/41c181fcc3e56fbc879aaecc053f357f184030d2/schematic/schemas/generator.py#L497 and that just uses range_value_relationship

I think that's the only place in the generator, but you might want to double-check.

ychae commented 3 years ago

@milen-sage has this issue been resolved?

BrunoGrandePhD commented 3 years ago

I don't think this has been implemented yet. IMHO, this needs to be included in the PyPI release.

BrunoGrandePhD commented 3 years ago

This is a nice-to-have feature for imCORE. I can tackle this once I have a few spare hours since it's just resurfacing something that was previously implemented.

brynnz22 commented 1 year ago

This would be nice for MC2 as well!

milen-sage commented 1 year ago

Noted and triaged.

milen-sage commented 1 year ago

It's dependent on a refactor of data model schema functionality currently worked on by @mialy-defelice . Once the refactor is done, features like this would be much easier to add.