GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
38 stars 21 forks source link

restore some 'Expected value' content that was lost in MIxS v6.2.0 #740

Open turbomam opened 10 months ago

turbomam commented 10 months ago

MIxS v6.2.0 is based on the following input: https://github.com/GenomicsStandardsConsortium/mixs/raw/issue-610-temp-mixs-xlsx-home/mixs/excel/mixs_v6.xlsx

See the mixs6.2_release_candidate repo's project.Makefile for elaboration

The following spreadsheet was used to convert MIxS Value syntaxes and Expected values into LinkML ranges and patterns: https://github.com/GenomicsStandardsConsortium/mixs6.2_release_candidate/blob/main/config/mixs-valsyns-expvals-to-linkml-ranges-patterns.tsv

Some content may have been lost, like

The material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]

for env_medium/MIXS:0000014

turbomam commented 10 months ago

To what extent do the Expected values need to be reiterated verbatim in one of the SlotDefinition metaslots?

hysterectomy has this Expected value: 'hysterectomy status'

After the range/pattern conversion, that Expected value is not present anymore.

  hysterectomy:
    description: Specification of whether hysterectomy was performed
    title: hysterectomy
    examples:
      - value: 'no'
    slot_uri: MIXS:0000287
    range: boolean

see also

turbomam commented 1 month ago

I would like a partner to help me with this. I can show some tricks for finding cases where the "Expected value" was lost between 6.0 and 6.2, but it would be nice to have somebody look with a new fresh set of eyes and even suggest a prioritized list of which ones to add back in.

turbomam commented 3 days ago

getting started again!

MIxS 6.2 is based on

https://github.com/GenomicsStandardsConsortium/mixs/raw/issue-610-temp-mixs-xlsx-home/mixs/excel/mixs_v6.xlsx

which is in a longstanding branch (@turbomam elaborate)

This makefile target in the mixs6.2_release_candidate converts that Excel file into the YAML file that is now https://github.com/GenomicsStandardsConsortium/mixs/blob/v6.2.0/src/mixs/schema/mixs.yaml but it also stashes a harmonized but otherwise unchanged view of the two worksheets in that XLSX file as https://github.com/GenomicsStandardsConsortium/mixs6.2_release_candidate/blob/main/GSC-excel-harmonized-repaired/mixs_v6.xlsx.harmonized.tsv

LEGACY_PREFIX=mixs_v6.xlsx

generated-schema/mixs_6_2_rc.yaml:
    $(RUN) write-mixs-linkml \
         --gsc-excel-input 'https://github.com/GenomicsStandardsConsortium/mixs/raw/issue-610-temp-mixs-xlsx-home/mixs/excel/mixs_v6.xlsx' \
         --gsc-excel-output-dir downloads \
         --classes-ssheet config/build-test-only/schema-for-classes-schemasheet.tsv \
         --classes-ssheet config/build-test-only/prefixes-for-classes-schemasheet.tsv \
         --classes-ssheet config/classes-schemasheet.tsv \
         --non-ascii-replacement ' ' \
         --schema-name $(RC_PREFIX) \
         --textual-key 'Structured comment name' \
         --linkml-stage-mods-file config/linkml-stage-mixs-modifications.yaml \
         --range-pattern-inference-file config/mixs-valsyns-expvals-to-linkml-ranges-patterns.tsv \
         --tables-stage-mods-file config/mixs-tables-stage-modifications.tsv \
         --harmonized-mixs-tables-file GSC-excel-harmonized-repaired/$(LEGACY_PREFIX).harmonized.tsv \
         --repaired-mixs-tables-file GSC-excel-harmonized-repaired/$(RC_PREFIX).repaired.tsv \
         --extracted-examples-out extracted-data/$(RC_PREFIX).extracted-examples.yaml \
         --repair-report conflict-reports/conflict-repair-report.tsv \
         --unmapped-report other-reports/un-handled-stringsers-expvals.tsv \
         --schema-file-out $@
turbomam commented 3 days ago

can convert the YAML form of MIxS 6.2 into a sheet with

 poetry run linkml2schemasheets-template \
    --source-path src/mixs/schema/mixs.yaml \
    --output-path mixs-schemasheets-template.tsv \
    --debug-report-path mixs-schemasheets-template-debug.txt \
    --log-file mixs-schemasheets-template-log.txt \
    --report-style concise

The mixs-schemasheets-template.tsv output from this could (which is't hcekced in anywhere) be compared to https://github.com/GenomicsStandardsConsortium/mixs6.2_release_candidate/blob/main/GSC-excel-harmonized-repaired/mixs_v6.xlsx.harmonized.tsv