EnvironmentOntology / envo

A community-driven ontology for the representation of environments
http://www.environmentontology.org
Creative Commons Zero v1.0 Universal
132 stars 51 forks source link

EnvO terms for host-associated samples #1029

Closed jagadishcs closed 2 years ago

jagadishcs commented 3 years ago

@cmungall @pbuttigieg @TBKReddy

1) What is the biome/broad-scale environmental context in EnvO for a sample(environmental medium) that comes from a host organism? When a biosample is from a host, plant/animal/human, can we have new biome in the EnvO - as host-associated or individually plant/animal/human - associated; my concern is that if the material is host-derived, then for the microbial community the biome is the host organism. Therefore, the following broad-scale environmental context EnvO terms are required for the sample that comes from host:

Host-associated biome Animal-associated biome Human-associated biome Plant-associated biome (please refer the third point for alternatives)

2) What will be the feature/local environmental context in EnvO for a sample that comes from a host organism? Let me give a simple example to contextualize the question: if the material is water, then, the feature/local environmental context can be freshwater river, lake, pond, etc. But, if the biosample is from a host organism, say, leaf, what would be the appropriate the feature/local environmental context EnvO term?

3) Alternatives are: When the biosample is from any host organism, we may also consider to have the biome as terrestrial or aquatic for applicable plants & animals, terrestrial biome to human (Basically to indicate from where the host organism comes from) and Feature can be plant/animal/human/host -associated.

I prefer the first option, the 'host-associated biome' to the EnvO for biosamples that comes from host organism but contain the microbial communities.

jagadishcs commented 3 years ago

@cmungall @pbuttigieg @TBKReddy

To create feature/Local environmental context for samples from host organisms:

Would it be a good idea to have the MIxS/EnvO triad for a leaf biosample from a plant, Brachypodium distachyon:

biome/Broad-scale environmental context: Plant-associated biome (not yet created in the EnvO, that is yet to be decided) feature/Local environmental context: commelinids (NCBITaxon_4734) (closest lineage level available in the EnvO for Brachypodium distachyon) material/Environmental medium: leaf (PO_0025034)

Feature: the closest lineage of the plant make sense for a feature/Local environmental context, since it meets the definition of environmental feature: "environmental features that are in the vicinity of and have a strong casual influence on the entity" (Pier Luigi Buttigieg et al. 2016).

Having the closest taxonomic lineage of the host plant at the local environmental context for a plant leaf biosample would be comparable to that of other environmental feature like freshwater river, lake, pond when the sample is water;

This is just a possibility and trying to see with this one example how best we can have the MIxS/EnvO triad for samples from host organisms.

Thanks

jagadishcs commented 3 years ago

@cmungall @wdduncan @pbuttigieg @TBKReddy

Is it possible to discuss and decide about creating the host-associated biomes since this will help us to assign the EnvO terms for about 6K plant-associated biosamples?

The EnvO already has 'environment associated with a plant part or small plant' (ENVO_01001057) and 'environment associated with an animal part or small animal' (ENVO_01001055) under ecosystem. Therefore, I believe, it should be appropriate and useful to create the following terms as biome/broad-scale environmental context in the EnvO:

Host-associated biome Animal-associated biome Human-associated biome Plant-associated biome

Thank you

pbuttigieg commented 3 years ago

Hi @jagadishcs

@cmungall @kaiiam @wdduncan and I discussed this in our monthly ENVO editors' call.

We'll update the MIxS annotation wiki page with some guidance to address your questions. Check in there in a couple of hours.

If we can't answer something there, we'll post here too.

pbuttigieg commented 3 years ago

Hi @jagadishcs

The wiki page noted above has been updated with guidance for host-associated microbial samples

Host-associated biome Animal-associated biome Human-associated biome Plant-associated biome

We wouldn't really create terms such as these unless we're referring to the entire microbiome of a given organism. The approach suggested in the wiki - and the use of the MIxS host metadata fields for taxonomy - should get you the information such terms would provide (and more).

jagadishcs commented 3 years ago

Thanks @pbuttigieg for your response. @cmungall @TBKReddy

For me, it is difficult to get convinced about the rule of 'env_broad_scale' for the host-associated biosamples; now it has been given as "entries should reflect the ecosystem the host is found in (e.g. an urban biome [ENVO:01000249] or a tundra biome [ENVO:01000180])"

I am unable to get convinced with this rule and let me explain the reason with an example: if a human gut is the biosample from an individual living in an an urban area, then, assigning the biosample 'urban biome' Vs when an individual from a village, then assigning the biosample 'village biome' as its env_broad_scale do not add useful value; the urban biome for a gut biosample (or for a leaf biosample taken from a tree from an urban area) does not meet the definition of biome (Pier Luigi Buttigieg et al., 2013); this issue is applicable to any plants or animals.

Therefore, creation of a few 'host-associated' terms as env_broad_scale/biomes in the EnvO for biosamples that are originated from host organisms would be useful.

The following suggested terms will not compete with MIxS host metadata fields for taxonomy but meets the EnvO definition for biome and the MIxS definition for broad-scale environmental context. Host-associated biome Animal-associated biome Human-associated biome Plant-associated biome

pbuttigieg commented 3 years ago

[...] if a human gut is the biosample from an individual living in an an urban area, then, assigning the biosample 'urban biome' Vs when an individual from a village, then assigning the biosample 'village biome' as its env_broad_scale do not add useful value;

For clarity/precision, the human gut is not the sample: a portion of [tissue,mucus,...] from the human gut is the sample.

The broad scale environment of the host adds context on what that host is likely to be exposed to, which will affect its various microbiomes. The skin and gut microbiome(s) of an organism living in the desert will vary from that of an organism from the same taxon living in, e.g. a forest.

Humans are a bit of a special case as the built environment changes a lot of things, but the density of settlements (and the implicit services available) can help an initial search (e.g. to compare the microbiomes on the hands of urban vs village dwellers. There are of course more precise metadata that should be used too (diet profiles, etc), but this is about the right level for the env_broad_scale field.

the urban biome for a gut biosample (or for a leaf biosample taken from a tree from an urban area) does not meet the definition of biome (Pier Luigi Buttigieg et al., 2013); this issue is applicable to any plants or animals.

I can't really follow the argument above.

Therefore, creation of a few 'host-associated' terms as env_broad_scale/biomes in the EnvO for biosamples that are originated from host organisms would be useful.

I'm not sure this is true - what more do they bring relative to annotating the anatomical site + using the MIxS taxon/host fields? Do you have an example?

The following suggested terms will not compete with MIxS host metadata fields for taxonomy but meets the EnvO definition for biome and the MIxS definition for broad-scale environmental context Host-associated biome Animal-associated biome Human-associated biome Plant-associated biome

Hmm, I don't really agree. I see these in direct competition with/redundant with the taxon and host information in MIxS and I can't really see what more they bring.

I'm not sure what you mean with compliance to the ENVO definition of biome here. The microbiome is also embedded in the biome the host is embedded in. It's an order removed perhaps, but still contextually accurate.

wdduncan commented 3 years ago

Just a suggestion/observation:

Perhaps there is some confusion about the intent of the env_broad/local/medium terms.

pbuttigieg commented 3 years ago

@wdduncan quite likely - do you have a suggestion on how we can resolve it or where the core of the confusion is?

cmungall commented 3 years ago

Out of interest, I looked at the the top values use in env_broad_scale for the host-associated package in INSDC via NCBI BioSample. The complete list is in https://github.com/INCATools/biosample-analysis/commit/ab953a44083d18c91465867f5aaa819034ea4948

These are the top N. As can be seen it's pretty ad-hoc and all over the place!

count value
513052
19529 not applicable
19250 urban biome
5693 coral reef
5237 gut
4347 missing
4172 marine biome
3090 lower digestive tract
3088 host-associated
2718 dense settlement biome
2639 mouse
2623 chicken intestine
2543 marine benthic biome
2505 mouse gut
2439 anthropogenic terrestrial biome
2435 not collected
2341 temperate biome
2304 farm
2249 NA
2127 intestine environment (ENVO:2100002)
2119 freshwater biome
2077 Mouse gut
2054 forest
1999 large river biome
1647 Gut
1634 Bos taurus taurus rumen microbiome
1506 host-associated habitat
1490 ENVO:01000219
1476 fecal material
1461 terrestrial biome ENVO:00000446
1432 gut microbiome
1416 anatomical entity environment
1383 research facility
1316 shrubland biome
1296 temperate forest biome
1285 mammalia-associated habitat
1221 Gut microbiome
1172 ocean biome
1140 ENVO:animal-associated habitat
1122 ENVO:00009002
1102 feces
1091 grassland biome
1066 savana
1065 Human-associated habitat
1051 animal-associated environment
1030 animal cage ENVO:01000922
1030 estuarine biome
1012 Rumen microbiome
959 terrestrial biome
952 Laboratory
931 N/A
917 subpolar coniferous forest biome
899 feces metagenome
886 intestine environment
841 animal distal gut
830 rangeland biome
824 chicken gut
803 mouse gut microbiome
763 Forest
757 tropical grassland biome
746 terresterial biome
727 stream biome
725 laboratory
724 fish gut biome
720 animal-associated environment [ENVO:01001002]
644 [ENVO:01000049]
633 Feces
cmungall commented 3 years ago

when we further filter for human/9606 as host:

count value
1118 dense settlement biome
1074 ENVO:01000219
863 urban biome
114 Human-associated habitat
44 not applicable
43 host-associated
29 airways
12 village biome
8 N/A
5 Wound
3
2 temperate
1 anatomical entity environment
1 Human periodontal pocket
1 Metagenomic RNA-seq
1 Human eye
1 Human cerebrospinal fluid
pbuttigieg commented 3 years ago

Thanks @cmungall - this shows the need for better documentation and outreach from the GSC and us on better annotations.

@ramonawalls for the GSC CIG - this shows the need for validation, so many of these have nothing to do with ENVO or any other controlled vocabs. The INSDC has never invested in validation, and this is the mess we get in return. Systems and brokerages like GFBIO help a great deal here @ikostadi