cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
92 stars 25 forks source link

Additional column groups besides required and recommended? #210

Closed turbomam closed 2 years ago

turbomam commented 3 years ago

Could DataHarmonizer support additional column groups?

One application I'm thinking of is support for all of the various MIxS packages in one template. I think there are 10 or 15 now. Maybe that's too many. Maybe I should just create a separate DataHarmonizer template for each package, because they could require different validation rules.

ddooley commented 3 years ago

If we had a feature for allowing several templates to be merged into one, it sounds like that would satisfy what you are talking about? If the different templates shared some columns, those could be merged into a new "shared" section? DataHarmonizer's json data structure for tabular columns should be amenable to this. But the conversion to LinkML touches on this too - LinkML seems to offer the potential to merge n specifications into one.

turbomam commented 3 years ago

Yes, that might be sufficient.

ddooley commented 3 years ago

Is a typical use-case that a given sample pertains to more than one MIxS package? Therefore a user wants to generate tabular data pertaining to just a few packages? And the remaining unselected packages would just have their fields dropped from managed/outputted data?

dehays commented 3 years ago

@turbomam Mark - I'd suggest beginning with a DH template per MIxS environment package. There are a few differences in required fields and in field semantics between packages. @ddooley 's idea of merging templates (or mixins) might be useful - each MIxS environment package shares those ~10 required fields (geo location + EnvO triad)

ddooley commented 2 years ago

I think this is resolved now insofar as both Marks branch https://turbomam.github.io/DataHarmonizer/main.html and the latest DH linkml-datastructure branch show multiple MIxS templates generated from a single linkml set of MIxS files, with lots of field sections visible in them. Ok to close this?

turbomam commented 2 years ago

I think the section jumping plus the multiple-template menu from linkml-datastructure meet our needs in NMDC. Thanks.