biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
170 stars 71 forks source link

Add an association type for organism-organism or species-species interactions #240

Open cmungall opened 5 years ago

cmungall commented 5 years ago

Note related to translator but would be good to have a standard biolink compliant way of stating globi graphs cc @diatomsRcool. I will explain this later @jhpoelen

nlharris commented 3 years ago

@diatomsRcool you probably had no idea this was assigned to you!

diatomsRcool commented 3 years ago

I did not! and I do need this now, so I'll start on it.

diatomsRcool commented 3 years ago

We definitely need a predicate to represent interactions between organisms/taxa. We may also need a way to relate taxa to each other based on taxonomy or phylogeny. I'm only saying this because I suspect it might be worth using a taxonomy or phylogeny to propagate data from a highly observed taxon to a sparsely observed taxon with an uncertainty measure based on the taxonomic or phylogenetic relationship. Just thinking out loud here.

sierra-moxon commented 2 years ago

note: we have an association for taxa->taxa:


  taxon to taxon association:
    is_a: association
    defining_slots:
      - subject
      - predicate
      - object
    slot_usage:
      subject:
        range: organism taxon
      relation:
        values_from:
          - ro
      object:
        range: organism taxon
        description: >-
          An association between individuals of different taxa.
jhpoelen commented 2 years ago

@sierra-moxon very neat! If you'd like, I can add support for your biolink format in GloBI and generate related data products automatically.

Do you have some examples I can work with?

cmungall commented 2 years ago

Chatting with @sierra-moxon

the current docs on t2t association is a bit confusing. It says between individuals but the D+R is taxa

Recall that bl distinguishes between

So this leads to an increase in the number of association types, e.g

  1. Chris, an instance of Homo sapiens, lives with Victor, an instance of Felis catus)
  2. Chris likes cats
  3. Humans and cats may form symbiotic relationships

Globi can capture all 3 and may for example capture edges of type 1 as evidence for edges of type 2 or 3. This of course mirrors many of the discussions in other parts of biolink/translator.

@jhpoelen support in globi would be great. Biolink can be expressed using different formats, and in fact the kgx serialization probably looks like like the existing globi TSV

jhpoelen commented 2 years ago

@cmungall I had a suspicion you live with cats . . . ; )

Please provide specific examples for biolink compatible format with use-case. Ideally, I'd have a way to verify that the data is valid and enable reuse somehow.

diatomsRcool commented 2 years ago

@jhpoelen are you asking for a use case that justifies representing interaction data in a biolink compliant format?

jhpoelen commented 2 years ago

@diatomsRcool Yes, a use case that shows how/when to use the biolink compliant formatted data would be nice. In my experience, data products with fancy formats are often left unused because folks don't know that they exist or don't know how/when to use them.

jhpoelen commented 2 years ago

btw @diatomsRcool @sierra-moxon @cmungall happy to chat more if desired. Just want to get a sense for what kinf of use y'all are envisioning.

diatomsRcool commented 2 years ago

ok, got it. My dream, if you will, is to have a hypothesis generator for ecology. Take a look at slide 3 in the talk at the link below. Ecology, of course, is very complicated and species interactions are integral to understanding what is going on. Does this make sense? https://doi.org/10.5281/zenodo.5104203

jhpoelen commented 2 years ago

@diatomsRcool Thanks for sharing your dream, and yes, it make sense. Very exciting!

Please let me know what kind of GloBI data product you'd need to help work towards your vision.

A detailed example of two, in the exact format would help to start producing the desired data product sooner rather than later.

jhpoelen commented 2 years ago

@diatomsRcool I am sure you have a lot on your plate, so i'll patiently wait for examples re: to your biolink use-case. Meanwhile, I'll assume the project is not yet in a state that allow non-biolink insiders to benefit from your inspired work.

diatomsRcool commented 2 years ago

Sorry. I think that if there is a GloBI data product that lists the EOL ID and/or the NCBITaxon ID of the two taxa interacting and the RO term for the interaction, that would be the bare minimum. This data product should be a file that can be downloaded via a URL. I don't quite remember all the different metadata you have in GloBI for each interaction. So, while the taxon IDs and RO terms are the bare minimum, all the accompanying metadata would also likely be useful. Does that help?

jhpoelen commented 2 years ago

@diatomsRcool thanks for clarifying.

As far as I understand, you'd do the conversion on your end, and existing data products like interactions.tsv.gz would work ok.

e.g.,

to get just list of taxon ids with associated RO term:

$ curl "https://zenodo.org/record/4460654/files/interactions.tsv.gz"\
 | gunzip\
 | mlr --tsvlite cut -f targetTaxonIds,interactionTypeId,sourceTaxonIds\
 | head -n2
GBIF:1357746 | INAT_TAXON:198981 | IRMNG:10602220 | ITIS:654388 | NCBI:586963 | OTT:821172 | http://treatment.plazi.org/id/0392879B737BAB2943D5FB65FA89FA8E | http://treatment.plazi.org/id/8A998764149F7E00E335FFE1BCB17FE1    http://purl.obolibrary.org/obo/RO_0002622   GBIF:3034521 | INAT_TAXON:83154 | IRMNG:10203376 | ITIS:29906 | NCBI:40947 | OTT:428216 | http://treatment.plazi.org/id/B87F34AA383C06E0CF4F70EC89CBEF72
sourceTaxonIds interactionTypeId targetTaxonIds
GBIF:1357746 | INAT_TAXON:198981 | IRMNG:10602220 | ITIS:654388 | NCBI:586963 | OTT:821172 | http://treatment.plazi.org/id/0392879B737BAB2943D5FB65FA89FA8E | http://treatment.plazi.org/id/8A998764149F7E00E335FFE1BCB17FE1 http://purl.obolibrary.org/obo/RO_0002622 GBIF:3034521 | INAT_TAXON:83154 | IRMNG:10203376 | ITIS:29906 | NCBI:40947 | OTT:428216 | http://treatment.plazi.org/id/B87F34AA383C06E0CF4F70EC89CBEF72
GBIF:1358021 | IRMNG:10940518 | ITIS:654371 | NCBI:586961 | OTT:821170 | http://treatment.plazi.org/id/0392879B737BAB2943D5FC0BFE04FBF9 http://purl.obolibrary.org/obo/RO_0002622 GBIF:3190089 | INAT_TAXON:54836 | IRMNG:11084524 | ITIS:505788 | NCBI:354526 | OTT:760765

But . . . there much more metadata available:

$ curl --silent "https://zenodo.org/record/4460654/files/interactions.tsv.gz"\
 | gunzip\
 | head -n1\
 | tr '\t' '\n'\
 | grep -n ".*"
1:sourceTaxonId
2:sourceTaxonIds
3:sourceTaxonName
4:sourceTaxonRank
5:sourceTaxonPathNames
6:sourceTaxonPathIds
7:sourceTaxonPathRankNames
8:sourceTaxonSpeciesName
9:sourceTaxonSpeciesId
10:sourceTaxonGenusName
11:sourceTaxonGenusId
12:sourceTaxonFamilyName
13:sourceTaxonFamilyId
14:sourceTaxonOrderName
15:sourceTaxonOrderId
16:sourceTaxonClassName
17:sourceTaxonClassId
18:sourceTaxonPhylumName
19:sourceTaxonPhylumId
20:sourceTaxonKingdomName
21:sourceTaxonKingdomId
22:sourceId
23:sourceOccurrenceId
24:sourceCatalogNumber
25:sourceBasisOfRecordId
26:sourceBasisOfRecordName
27:sourceLifeStageId
28:sourceLifeStageName
29:sourceBodyPartId
30:sourceBodyPartName
31:sourcePhysiologicalStateId
32:sourcePhysiologicalStateName
33:sourceSexId
34:sourceSexName
35:interactionTypeName
36:interactionTypeId
37:targetTaxonId
38:targetTaxonIds
39:targetTaxonName
40:targetTaxonRank
41:targetTaxonPathNames
42:targetTaxonPathIds
43:targetTaxonPathRankNames
44:targetTaxonSpeciesName
45:targetTaxonSpeciesId
46:targetTaxonGenusName
47:targetTaxonGenusId
48:targetTaxonFamilyName
49:targetTaxonFamilyId
50:targetTaxonOrderName
51:targetTaxonOrderId
52:targetTaxonClassName
53:targetTaxonClassId
54:targetTaxonPhylumName
55:targetTaxonPhylumId
56:targetTaxonKingdomName
57:targetTaxonKingdomId
58:targetId
59:targetOccurrenceId
60:targetCatalogNumber
61:targetBasisOfRecordId
62:targetBasisOfRecordName
63:targetLifeStageId
64:targetLifeStageName
65:targetBodyPartId
66:targetBodyPartName
67:targetPhysiologicalStateId
68:targetPhysiologicalStateName
69:targetSexId
70:targetSexName
71:decimalLatitude
72:decimalLongitude
73:localityId
74:localityName
75:eventDateUnixEpoch
76:argumentTypeId
77:referenceCitation
78:referenceDoi
79:referenceUrl
80:sourceCitation
81:sourceNamespace
82:sourceArchiveURI
83:sourceDOI
84:sourceLastSeenAtUnixEpoch

Note that for sake of simplicity, not all of the linked taxonomies in {source|target}TaxonIds are included in the expanded fields (e.g., sourceTaxonId, sourcePathIds).

Hope this helps you to access the minimal information you need.

If you need more information, please do holler, as I realize that the text above might not be as informative to others as it is to me.