GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
33 stars 20 forks source link

Lets' make sure we are all comfortable with the ideas of sections and subsets. #679

Open turbomam opened 8 months ago

turbomam commented 8 months ago

only1chunts update to this ticket:

There are now #771 and #772 for discussion on the definition of what section means and what the values of section could be. I would like to alter the focus of this ticket to be more technical, i.e. while the CIG refers to this as section, the LinkML implementation will probably use something else, and as the original ticket content (below) gives various options for that I want this ticket to focus on that aspect. \ eg. in_subset, is_a (hierarchy) or slot_group.

turbomans original ticket content:

In the MIxS 6.1 Excel sheet, terms can have a 'Section' value. That has been implemented with subset definitions and in_subset assertions in the v6.2.0 LinkML YAML file.

The MIxS sections were:

In my eye, those provide a useful and actionable grouping of slots. For example, from an NMDC perspective, adapters is not an attribute of a sample, but rather an attribute of a sequencing process. In some data serializations I do understand that it might may be "practical" to bind adapters to a sample in a shortcut relationship.

I don't feel like sections or subsets have been discussed as much as Checklist classes, Extension classes and terms/slots. I don't think we have any records that capture our thoughts as well as these notes about slot attributes.

We have a few ways to group slots thematically in LinkML, like in_subset, is_a (hierarchy) or slot_group. However, if we want to retain grouping like that but don't like the language "subset", then we can we can use a different heading in the documentation pages.

I am asking to discuss this in a TWG or CIG meeting and make some link-able notes, but won't object to postponing it for a while.

_Note that I unilaterally added a combination_classes subset to v6.2.0_. That's more infrastructural than thematic. I wanted to simplify special handling for combination classes, but for now they can also be distinguished by the fact that they both have is_a relations and use minxins. I'm not convinced that those will always be good differentia. So that might argue for implementing the old sections as slot_groups, which are not used in v6.2.0 and don't have any baggage.

only1chunts commented 8 months ago

This sounds like something that should be discussed in person as it's sounding a bit like a situation where we're talking at cross purposes. And you are correct that it's not been discussed in depth yet

On Thu, 19 Oct 2023, 20:18 Mark A. Miller, @.***> wrote:

In the MIxS 6.1 Excel sheet, terms can have a 'Section' value. That has been implemented with subset definitions https://github.com/GenomicsStandardsConsortium/mixs/blob/a66c92b9d7d68b0bfefd9dacb54081c261af4a9d/src/mixs/schema/mixs.yaml#L24-L29C17 and in_subset https://github.com/GenomicsStandardsConsortium/mixs/blob/a66c92b9d7d68b0bfefd9dacb54081c261af4a9d/src/mixs/schema/mixs.yaml#L3959-L3971 assertions in the v6.2.0 LinkML YAML file.

The MIxS sections were:

  • sequencing
  • environment
  • nucleic acid sequence source
  • investigation

In my eye, those provide a useful and actionable grouping of slots. For example, from an NMDC perspective, adapters is not an attribute of a sample, but rather an attribute of a sequencing process. In some data serializations I do understand that it might may be "practical" to bind adapters to a sample in a shortcut relationship.

I don't feel like sections or subsets have been discussed as much as Checklist classes, Extension classes and terms/slots. I don't think we have any records that capture our thoughts as well as these notes about slot attributes https://docs.google.com/document/d/1LzNvt3b09JSNxlf2e2IBwzc33AXe9-mc1GHUR0iBIhg/edit#bookmark=id.gipej8ej40qs .

We have a few ways to group slots thematically in LinkML, like in_subset, is_a (hierarchy) or slot_group. However, if we want to retain grouping like that but don't like the language "subset", then we can we can use a different heading in the documentation pages.

I am asking to discuss this in a TWG or CIG meeting and make some link-able notes, but won't object to postponing it for a while.

Note that I unilaterally added a combination_classes subset to v6.2.0. That's more infrastructural than thematic. I wanted to simplify special handling for combination classes, but for now they can also be distinguished by the fact that they both have is_a relations and use minxins. I'm not convinced that those will always be good differentia. So that might argue for implementing the old sections as slot_groups, which are not used in v6.2.0 and don't have any baggage.

— Reply to this email directly, view it on GitHub https://github.com/GenomicsStandardsConsortium/mixs/issues/679, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOB5GLSBRWXH3NSYRN4U43YAF4JPAVCNFSM6AAAAAA6HWTKEOVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE2TEOJRGE3DINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

turbomam commented 8 months ago

Sure, discussing in person is a good idea.

Executive summary: