TranslatorSRI / bl_lookup

Biolink Model Lookup Service
MIT License
0 stars 0 forks source link

Mixins included in /ancestors, but not /descendants? #42

Closed amykglen closed 3 years ago

amykglen commented 3 years ago

here's an example - this query for ancestors of NucleicAcidEntity:

curl -X GET "https://bl-lookup-sri.renci.org/bl/NucleicAcidEntity/ancestors?version=2.1.0" -H  "accept: application/json"

returns ancestors including some mixins, like PhysicalEssence:

[
  "biolink:MolecularEntity",
  "biolink:ChemicalEntity",
  "biolink:NamedThing",
  "biolink:Entity",
  "biolink:GenomicEntity",
  "biolink:ThingWithTaxon",
  "biolink:PhysicalEssence",
  "biolink:PhysicalEssenceOrOccurrent",
  "biolink:OntologyClass",
  "biolink:ChemicalOrDrugOrTreatment",
  "biolink:ChemicalEntityOrGeneOrGeneProduct",
  "biolink:ChemicalEntityOrProteinOrPolypeptide"
]

but a query for descendants of NucleicAcidEntity:

curl -X GET "https://bl-lookup-sri.renci.org/bl/NucleicAcidEntity/descendants?version=2.1.0" -H  "accept: application/json"

does not return any mixins:

[
  "biolink:NucleicAcidEntity",
  "biolink:Transcript",
  "biolink:RNAProduct",
  "biolink:NoncodingRNAProduct",
  "biolink:MicroRNA",
  "biolink:SiRNA",
  "biolink:RNAProductIsoform",
  "biolink:Exon",
  "biolink:CodingSequence"
]

is that how it's supposed to work?

I admit I'm still kind of new to the concept of mixins, so it may just be that my understanding is lacking here. but if they're supposed to be included, I'd think the descendants of NucleicAcidEntity would look something like this?

[
                "biolink:UnclassifiedOntologyClass",
                "biolink:GeneProductIsoformMixin",
                "biolink:CodingSequence",
                "biolink:OntologyClass",
                "biolink:GenomicEntity",
                "biolink:NucleicAcidEntity",
                "biolink:RNAProduct",
                "biolink:Transcript",
                "biolink:Exon",
                "biolink:SiRNA",
                "biolink:RNAProductIsoform",
                "biolink:TaxonomicRank",
                "biolink:PhysicalEssence",
                "biolink:GeneOntologyClass",
                "biolink:GeneProductMixin",
                "biolink:RelationshipType",
                "biolink:MicroRNA",
                "biolink:NoncodingRNAProduct"
            ]
cbizon commented 3 years ago

Hi @amykglen sorry for sleeping on this question. I think that it's operating as intended?

Mixins are simply other ancestors for an entity, so for Nucleic acid entity, the model says this:

  nucleic acid entity:
    is_a: molecular entity
    description: >-
      A nucleic acid entity is a molecular entity characterized by
      availability in gene databases of nucleotide-based sequence
      representations of its precise sequence; for convenience of
      representation, partial sequences of various kinds are included.
    aliases: [ 'sequence feature', 'genomic entity' ]
    mixins:
      - genomic entity
      - physical essence
      - ontology class

So it has 4 "parents": molecular entity, genomic entity, physical essence, and ontology class. The set of its ancestors is those 4 parents, plus all their parents etc etc.

The descendents are those things that have nucleic acid entity as an ancestor. I don't think it's possible for a mixin to have a non-mixin as an ancestor, so you shouldn't see any mixins as descendents of nucleic acid entity.

For example, biolink:GeneProductIsoformMixin has as its ancestor "gene product mixin" which has as its ancestor "gene or gene product" which has as its ancestor "macromolecular machine mixin" which has no more ancestors. So nucleic acid entity is not an ancestor of gene product isoform mixin, which is therefore not a descendent of nucleic acid entity.

Does that make sense?

amykglen commented 3 years ago

Ok, got it. Thanks for the explanation!

That's interesting... so when determining descendants for a mixin (e.g., GeneProductMixin), is the logic to:

1) figure out which categories GeneProductMixin is listed as a mixin for (Protein and RNAProduct), and then 1) find all the descendants of those categories identified in step 1 (ProteinIsoform, NoncodingRNAProduct, etc.) 1) also find any mixin descendants of GeneProductMixin (GeneProductIsoformMixin)

and return the union?

cbizon commented 3 years ago

And also return any categories and descendants from 3. Now, to be clear, I didn't implement this- it's part of Biolink Model Toolkit, but this is my understanding.

I'll go ahead and close this issue, but feel free to reopen if necessary.