A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
90
stars
23
forks
source link
LinkML slot attributes for driving validations #300
This snippet shows how to list all of the attributes that are applicable to LinkML slots:
from linkml_runtime import SchemaView
meta_url = "https://raw.githubusercontent.com/linkml/linkml-model/main/linkml_model/model/schema/meta.yaml"
meta_view = SchemaView(meta_url)
sis = meta_view.class_induced_slots('slot_definition')
for i in sis:
print(i.name)
Here are some that I think should be used for validations:
slot_attribute
notes
maximum_value
works with the NMDC templates now that @pkalita-lbl PR'ed a float caster in #299.
minimum_value
works with the NMDC templates now that @pkalita-lbl PR'ed a float caster in #299.
multivalued
todo, along with min and max cardinality? "|" shouldn't appear in non-multivalued columns?
identifier
I think that this is being acted upon, like for MIxS' source_mat_id, which NMDC entitles XXX. Makes the column required and enforces uniqueness. Can only be applied to one column (i.e. one attribute per class). We need some other way to express that other columns should take unique values.
pattern
works, as composed regular expressions
range
I don't think any action is taken on ranges on their own. NMDC has data and code for matching ranges to (regular expression) patterns within the LinkML schema. Is DataHarmonizer still validating based on xsd types in the linkml-datastructure branch, the way it doe sin the main branch? If so, we should make sure common LinkML classes and types are related to the xsd types.
string_serialization
NMDC has data and code for matching composed string_serializations to (regular expression) patterns, but it's really a misuse of string_serialization, which is meant to generate strings based on a template of attribute names. We should be doing this though structured_patterns instead. Furthermore, structured_patterns should take advantage of pre-composed chunks from LinkML settings. Is there a desire for any of this to happen in real time within DataHarmonizer? We will need to build up a library of settings and expansions, along with some understanding of the MIxS grammar, especially the use of ; and \|
structured_pattern
@sujaypatil96 and others are working the expansion of structured_patterns. See string_serialization above
required
works
id_prefixes
todo? Value in columns with id_prefixes would have to begin with one of the prefixes, then a colon, then some local portion.
maximum_cardinality
todo, along with multivalued?
minimum_cardinality
todo, along with multivalued?
ifabsent
todo? How would this relate to DataHarmonizer's mechanisms for default values?
See also #267
This snippet shows how to list all of the attributes that are applicable to LinkML slots:
Here are some that I think should be used for validations:
source_mat_id
, which NMDC entitles XXX. Makes the column required and enforces uniqueness. Can only be applied to one column (i.e. one attribute per class). We need some other way to express that other columns should take unique values.xsd
types in thelinkml-datastructure
branch, the way it doe sin themain
branch? If so, we should make sure common LinkML classes and types are related to thexsd
types.string_serialization
s to (regular expression)pattern
s, but it's really a misuse ofstring_serialization
, which is meant to generate strings based on a template of attribute names. We should be doing this thoughstructured_pattern
s instead. Furthermore,structured_pattern
s should take advantage of pre-composed chunks from LinkMLsettings
. Is there a desire for any of this to happen in real time within DataHarmonizer? We will need to build up a library ofsettings
and expansions, along with some understanding of the MIxS grammar, especially the use of;
and\|
structured_pattern
s. Seestring_serialization
aboveid_prefixes
would have to begin with one of the prefixes, then a colon, then some local portion.multivalued
?multivalued
?