cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
92 stars 25 forks source link

template meta builder (entry of select lists for lazy people) #212

Closed turbomam closed 2 years ago

turbomam commented 3 years ago

There is a geo_loc_name section in the CanCOGeN template. I assume somebody entered the content below mostly by hand?

I would be interested in a tool that could populate the following into the template just by specifying the parent term GAZ:00002561 'Province (Canada)'. Then the user could add, delete or modify any of that.

Or instead of a tool, it could just be some documentation/hints... see further down

Ontology ID Meaning (LinkML) parent class label
ID   SC %  
GAZ_00002566   geo_loc_name (state/province/territory) Alberta
GAZ_00002562   geo_loc_name (state/province/territory) British Columbia
GAZ_00002571   geo_loc_name (state/province/territory) Manitoba
GAZ_00002570   geo_loc_name (state/province/territory) New Brunswick
GAZ_00002567   geo_loc_name (state/province/territory) Newfoundland and Labrador
GAZ_00002575   geo_loc_name (state/province/territory) Northwest Territories
GAZ_00002565   geo_loc_name (state/province/territory) Nova Scotia
GAZ:00002574   geo_loc_name (state/province/territory) Nunavut
GAZ_00002563   geo_loc_name (state/province/territory) Ontario
GAZ_00002572   geo_loc_name (state/province/territory) Prince Edward Island
GAZ_00002569   geo_loc_name (state/province/territory) Quebec
GAZ_00002564   geo_loc_name (state/province/territory) Saskatchewan
GAZ_00002576   geo_loc_name (state/province/territory) Yukon

Sample documentation

"Run the following query at" http://sparql.hegroup.org/sparql/

PREFIX  GAZ:  <http://purl.obolibrary.org/obo/GAZ_>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  oio:  <http://www.geneontology.org/formats/oboInOwl#>

SELECT  (str(?i) AS ?Ontology_ID) ("" AS ?Meaning_LinkML) 
("geo_loc_name (state/province/territory)" AS ?parent_class) (str(?l) AS ?label)
WHERE
  { GRAPH <http://purl.obolibrary.org/obo/merged/GAZ>
      { ?s  rdfs:subClassOf  GAZ:00002561 ;
            rdfs:label       ?l ;
            oio:id           ?i
      }
  }
turbomam commented 3 years ago

@cmungall is suggesting that I do as much of the template building by parsing the LinkML model of the various MIxS yaml files at https://github.com/GenomicsStandardsConsortium/mixs-source

turbomam commented 3 years ago

So GAZ content isn't important to us and SPARQL against OntoBee isn't a solution we would probably use.

This issue is just a worked example on "how would you do automated template generation?"

ddooley commented 3 years ago

As a first step a LinkML version of DataHarmonizer would at least allow select field sources to identify the ontology vocabulary branch(es) to fetch terms from - but there would likely need to be features for placing constraints around that, like fetch terms to a certain depth. A past project offered a dynamic lookup service so that the project config files didn't need to have the whole vocabulary loaded. But this becomes a bit of a load when validation is required on such fields.

ddooley commented 3 years ago

P.s. our previous GAZ and DO dynamic lookup system went straight to OLS API. You can see it in action in example form https://genepio.org/geem/form.html#GENEPIO:0001777 - any of the "lookup choice" fields will provide a popup if you provide an initial choice to start from. (Its a prototype we would redo the code for btw). If OntoBee isn't a solution would OLS be for dynamic lookup?

turbomam commented 3 years ago

Thanks @ddooley

I think I was too eager to create an issue here. I'm well on my way to creating templates by parsing LinkML files. I think that should be self sufficient and not require any external lookups. I'll share my code soon.

direct answer to your question: yes, I use OLS a lot and using it for this poorly worded issue would make sense, if external lookups were required.

turbomam commented 3 years ago

Here's my work in progress for creating a DataHarmonizer template from a LinkML schema

I've shared the converted website as a github page

None of this is guaranteed to stay in the same location or to be up 100% of the time