cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
90 stars 23 forks source link

rethink is_a dh_interface requirement for including class in menu system? #338

Open turbomam opened 1 year ago

turbomam commented 1 year ago

I think that LinkML schema classes aren't included in the template menu unless they is_a: dh_interface

is_a is single-valued, so reserving it for this purpose eliminates its usefulness for making real hierarchical statements about classes. We could switch to

mixins:

but I think @cmungall may have something else in mind

cc @pkalita-lbl

ddooley commented 1 year ago

Ah, I see, is_a being single valued could be problematic visa vis inheritance etc.

d.

ddooley commented 7 months ago

LinkML has a "tree_root" marker that can be set explicitly (or is otherwise inferred within various LinkML interrogation methods as a class that doesn't show up in range of another slot). Is "tree_root" a better candidate rather than "dh_interface"?

pkalita-lbl commented 7 months ago

I think there's a decent argument to be made for using tree_root. Lots of other LinkML tools that need to look at a particular class within a schema will use tree_root as part of the process of identifying which class to look at. One word of caution though, I don't think it's strictly enforced in any way but I think most tools assume there is only one class in a schema with tree_root: true. See for example this utility method in the LinkML codebase: https://github.com/linkml/linkml/blob/2407a2f2c629092c15da6e0295600d895d34a465/linkml/utils/datautils.py#L69-L80

If sticking with the one-tree-root-per-schema convention works for you then I would encourage using that. If you need to mark multiple classes per schema as DH interfaces, then you might think about coming up with a convention that uses the annotations slot on class definitions.

ddooley commented 5 months ago

What we are seeing with the new DataHarmonizer 1-many data schema is that we need a "Container" class in a LinkML specification that lists all the Classes to show as tables involved, and their relationships as defined by primary keys. So we won't be needing dh_interface any more.

image