GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
37 stars 21 forks source link

extension dendrogram with code #767

Closed turbomam closed 7 months ago

ramonawalls commented 7 months ago

@turbomam Before I approve, I have one question: What is the purpose of soil-vs-water-slot-usage.yaml and why only that one? Okay, that's two questions.

turbomam commented 7 months ago

What is the purpose of soil-vs-water-slot-usage.yaml? Why only that one?

Excellent question! Thanks for paying attention. I should have done a better job of creating textual documentation for non-developers.

pyproject.toml includes this new block

[tool.poetry.scripts]
extension-distances = 'scripts.extension_distances:generate_dendrogram'
extension-differences = 'scripts.extension_slot_diffrences:set_arithmatic'

and project.Makefile contains this new block, which illustrates how to uses those new scripts, and what kind of output they generate:

extensions-dendrogram.pdf:
    $(RUN) extension-distances \
        --schema src/mixs/schema/mixs.yaml \
        --output $@

soil-vs-water-slot-usage.yaml: src/mixs/schema/mixs.yaml
    $(RUN) extension-differences \
        --schema $< \
        --ext1 Soil \
        --ext2 Water > $@

The extension-distances script generates a PDF dendrogram of the term-usage distances between all Extensions and doesn't require any configuration besides specifying inputs and output.

extension-differences generates a textual report of the shared and disjoint terms between one pair of Extensions. It could be run as needed against any pair,or we could enhance it to generate one very long report of all pairings.

The PR adds sample output from both of those scripts in the root of the repo. That's probably not a good long-term practice. Maybe we should just add a assets/ directory in subsequent PR.