GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
33 stars 20 forks source link

Create minimal per-class slot reporter; compare to exisiting LinkML, NMDC and MIxS scripts #811

Open turbomam opened 2 days ago

turbomam commented 2 days ago

I suspect that this overlaps with a lot of existing issues, along the lines of "make it easy for people to find release artifacts"

turbomam commented 2 days ago

Starting by generating Markdown documentation for all of the scripts in src/scripts. Using ChatGPT 4.

turbomam commented 2 days ago
find . -name "*.py" | sort
./src/mixs/datamodel/mixs.py
./src/mixs/_version.py

./src/scripts/camel_case_enums.py
./src/scripts/combinations_list_generator.py
./src/scripts/enumerations_list_generator.py
./src/scripts/extension_distances.py
./src/scripts/extension_slot_diffrences.py

./src/scripts/__init__.py

./src/scripts/isolate_slots.py
./src/scripts/mixs_slots_report.py
./src/scripts/organize_excel_files.py
./src/scripts/term_list_generator.py

./tests/__init__.py
./tests/test_data.py
turbomam commented 2 days ago

I moved some of those scripts to an inactive directory

src/scripts/isolate_slots.py is derrived from output of linkml2schemasheets-template and comes closest to generating a tabular representation of the schema, but it creates one global slot report, and we want per-class slot reports, showing the induced slot usage.

(I don't think we have determined what takes precedent yet, when a class has a is_a parent and uses mixins.)

turbomam commented 2 days ago

I am inclined to read the schema into a SchemaView, iterate over the in-scope classes, and report selected properties of the induced slots/attributes.

@cmungall would prefer that this kind of thing is developed in a LinkML repo, but the turnaround is slower. So develop with in a generalizable way with good coding practices so that it can eventually be moved into LinkML

I don't think the stock linkml2sheets will work here either.

turbomam commented 2 days ago

form @pbuttigieg: generate the reports in a dev path and then move them to a root path upon release, like ODK does

turbomam commented 2 days ago

This could be backported to older releases. but the reports for previous releases wouldn't be bundled cumulatively with each new release