BioSchemas / specifications

Issue tracker, technical wiki, and example markup
https://bioschemas.org
54 stars 52 forks source link

Add hasBioPolymerSequence type to BioChemEntity #583

Open AlasdairGray opened 2 years ago

AlasdairGray commented 2 years ago

We already had a discussion (starting here) about adding hasBioPolymerSequence as a sub-property of hasRepresentation and moving inChI, inChIKey, and smiles properties to being sub-properties as well.

The outcomes of the discussion did not make it into the schema.org submission (I suspect confusion in the generation of the ttl files submitted, but it was a long time ago now).

Appropriate changes need to be made and can be done as part of #542

Tasks:

AlasdairGray commented 2 years ago

Working on this within the DDE. Have generated the updated BioChemEntity type (v0.8-DRAFT) but it is not currently loading back into the DDE.

gtsueng commented 1 year ago

Updated to DDE

gtsueng commented 1 year ago

still needs to be updated to the website

gtsueng commented 1 year ago

It's not clear to me that properties can just be nested like that. I have not seen examples of it in schema.org. If it's desirable to nest properties, it may be necessary to create a new type, 'BioChemEntityRepresentation' or something along those lines which includes the properties: hasBioPolymerSequence, inChI, inChIKey, and smiles. Then, assign this new type as the expected type for hasRepresentation. If anyone knows of an example with such a nesting in schema.org, please share.

ljgarcia commented 1 year ago

@gtsueng properties can be nested, see for instance https://schema.org/masthead (sub-property of https://schema.org/publishingPrinciples or https://schema.org/accountId (sub-property of https://schema.org/identifier) or https://schema.org/tocEntry (sub-property of https://schema.org/hasPart). A Property is indeed a child of the Intagible type in schema.org

    {
      "@id": "schema:tocEntry",
      "@type": "rdf:Property",
      "rdfs:comment": "Indicates a [[HyperTocEntry]] in a [[HyperToc]].",
      "rdfs:label": "tocEntry",
      "rdfs:subPropertyOf": {
        "@id": "schema:hasPart"
      },
      "schema:domainIncludes": {
        "@id": "schema:HyperToc"
      },
      "schema:isPartOf": {
        "@id": "https://pending.schema.org"
      },
      "schema:rangeIncludes": {
        "@id": "schema:HyperTocEntry"
      },
      "schema:source": {
        "@id": "https://github.com/schemaorg/schemaorg/issues/2766"
      }
    }

We have not proposed any property with sup-properties but this is indeed what @AlasdairGray suggested. Could you please have a second look to it? Thanks

gtsueng commented 1 year ago

@ljgarcia To be clear, we want to just define these properties as nested for the sake of organizing the properties, correct? The property hierarchies in Schema.org do not appear to have any affect on the structure of their use in a Class and appear to be defined in a hierarchy just for the sake of organizing the properties. For example, https://schema.org/masthead is a subproperty of PublishingPrinciples, and is used in NewsMediaOrganization. This does not mean that NewsMediaOrganization has a property called publishingPrinciples for which a subproperty called masthead is used to store a CreativeWork object. Instead, NewsMediaOrganization just has a property called masthead for which a CreativeWork is expected--That's it.

So from a Bioschemas perspective, this would mean that the property hasBioPolymerSequence would be used in BioChemEntity without hasRepresentation, inChi (inChikey and smiles) would be used in MolecularEntity without hasRepresentation, and hasRepresentation would just be used in ChemicalSubstance, am I understanding this correctly? There would be no attempt to use inChi under hasRepresentation for ChemicalSubstance or anything like that.

ljgarcia commented 1 year ago

Yes, the idea is organizing the properties. The property hierarchy would have an impact on validation. As for the implications discussed in the last paragraph, I am not sure. Better double check with @egonw as to what makes sense for MolecularEntity and ChemicalSubstance. It might be we do not need/want the property hierarchy (not if it complicates things too much and has little/no effect).

gtsueng commented 1 year ago

I guess it's not completely clear to me how the property hierarchy would affect validation since each property is used in an unnested fashion in the corresponding class/type.

For example, AudioBook has a property readBy which is a subproperty of actor. It's not like you can just use the property actor in AudioBook in lieu of readBy--that would give an error. Simlarly, Movie uses the property actor for which you cannot substitute readBy and still have it validate properly.

ivanmicetic commented 1 year ago

Update on hasBioPolymerSequence property: the BioChemEntity v0.8-DRAFT type has hasBioPolymerSequence as a new property, not yet integrated in schema.org. This implies that all profiles and types inheriting from BioChemEntity class will need to be updated.

Profiles inheriting BioChemEntity class (only latest release and draft):

Types inheriting BioChemEntity class:

gtsueng commented 1 year ago

Taxon is a child of Thing, not BioChemEntity.

Gene and Protein had this property long before BioChemEntity had it, so it should already be there.

ProteinAnnotation has been deprecated -- no reason to update it at this point as it's been superceded by SequenceAnnotation

Sample is pending deprecation to be superceded by BioSample

BioSample is awaiting additional changes from the working group (and potential BioHackEU2023 project)

For MolecularEntity, ChemicalSubstance (and anything else being developed by the Chemical Working Group) it is unclear if this property will be used directly or a child or parent property of this property (see discussion on nesting of properties above).

Everything else should be updated (edited by LJ) Types

Profiles:

ljgarcia commented 1 year ago

@bedroesb @AlasdairGray @ivanmicetic Could you please have a look to the pending task about property colors?

ljgarcia commented 1 year ago

@egonw @sneumann @gtsueng @nsjuty @oxgiraldo @albangaignard let's discuss about the proposal of having nested properties for the three (four?) identification options for MolecularEntity

The idea here (as far as I understand) would be having hasRepresentation as property for MolecularEntity as "minimum" in the profile but using either hasRepresentation or any of its children for a particular individual of type MolecularEntity.

This is an ontology/validation question. If we only specify hasRepresentation for the type MolecularEntity, can we have a MolecularEntityIndividual using instead inChiKey and expect that reasoners and validators (e.g., ShEX, SHACL) do not complain about it and gives us the expected output? The expected output in this case would be the reasoner not complaining and the validation passing.

@gtsueng already said that

For example, AudioBook has a property readBy which is a subproperty of actor. It's not like you can just use the property actor in AudioBook in lieu of readBy--that would give an error. Simlarly, Movie uses the property actor for which you cannot substitute readBy and still have it validate properly.

If @gtsueng is right, then I do not see any advantage in having nested properties.

Comments?

gtsueng commented 1 year ago

Regarding:

"Fix colour coding hasBioPolymerSequence in displays for draft profiles (should be Bioschemas not pending)"

Gene and Protein types had the property hasBioPolymerSequence long before this property was included in BioChemEntity. The Gene and Protein types that are currently pending on Schema.org have these properties. So they should be colored as pending, no?

image

gtsueng commented 1 year ago

Here's what happens when you use actor in the Audiobook example: image

and what happens when you use readBy in a Movie type: image As this is a Movie type, the property actor is automatically parsed as a Person type in the validator. The same cannot be said by the property readBy which is a subproperty of the property actor.

ljgarcia commented 1 year ago

Thanks @gtsueng for the analysis on the property hierarchy. I do not see a clear advantage in having the hierarchy. Unless @egonw @sneumann see an advantage there, I would suggest not implementing that change.

ljgarcia commented 1 year ago

Suggestion: drop the property hierarchy suggestion. @ivanmicetic if we drop it, anything else in this issue that it is pending?

ljgarcia commented 1 year ago

Comments from the 2023.06.26 community call:

sneumann commented 1 year ago

I can see the benefit if we could tighten validation for e.g. MolecularEntity and specify Minimum for hasRepresentation, which in turn could be any of inChI/iupacName/smiles, but in @gtsueng 's comment above that doesn't work as intended through subProperties.

The new proposal is now to keep MolecularEntity.Identifier at Minimum, but require not just a text value "MIIFHRBUBUHJMC-UHFFFAOYSA-N.1" but to require a PropertyValue with value=MIIFHRBUBUHJMC-UHFFFAOYSA-N.1 and propertyID=http://semanticscience.org/resource/CHEMINF_000059 ?

egonw commented 11 months ago

I am sorry. I have been under a DDOS attack with project deliverables. Let me check.