GenomicsStandardsConsortium / mixs

Minimum Information about any (X) Sequence” (MIxS) specification
https://w3id.org/mixs
Creative Commons Zero v1.0 Universal
36 stars 21 forks source link

"Symbiont" - ambiguous usage in the Symbiont-associated package #195

Open pbuttigieg opened 3 years ago

pbuttigieg commented 3 years ago

Describe the bug

In the symbiont-associated package in consideration for v6.0, the use of the term "symbiont" appears to be unclear and inconsistent.

@ramonawalls @pyilmaz @lschriml @only1chunts

Based on our CIG call with @ndheilly - the issue of semantic ambiguity with the term "symbiont" arose.

@ndheilly indicated that this term is meant to signify the host in a symbiotic relationship. For example, in a heterotrophic worm with bacterial communities providing chemolithoautotrophic nutrition, the worm would be the "symbiont".

However, this is counter to the common usage of the term (it is frequently said that the worm has bacterial symbionts) and is also not internally consistent in the proposed definitions of the symbiont-associated package (e.g. #179 #184 - if a symbiont has a host, and the symbiont is a host, then the host is also a symbiont)

Regardless of how many authors are on the pending paper, changing the meaning of a well-accepted biological term is not sensible for a standards organisation. If that paper changes the opinion of the broader community, such that textbook definitions change (or at least there's no significant or well-grounded debate and multiple review articles agree on the usage), that would be a different story.

"Symbiont" signifies any organism in a symbiotic relationship (especially the smaller partner). Arbitrarily changing this is unwise and sets a poor precedent.

Expected behavior

A semantically consistent rendition of symbionts and the relation of symbiont terms to host terms should be presented.

Additional context

This is a common problem across multiple MIxS packages, many of which have poor (e.g. circular, ambiguous, or inconsistent) and/or arbitrary definitions for their terms that conflict with the value syntax.

ndheilly commented 3 years ago

The worm, a parasite, is a symbiont, because it has a parasitic relationship with its host. the fact that it can also host microbes (bacteria, viruses etc.) does not change the fact that it is a symbiont itself. We do not "arbitrarily" change the definition.

pbuttigieg commented 3 years ago

@ndheilly thanks for the clarification. Sorry, I should have been more specific: I'm speaking of Riftia worms which do not have a parasitic relationship with any other organism. In my reading of most of the definitions I've come across, both the worms and bacteria would be considered symbionts as they both participate in a symbiotic relationship.

The changing of the definition was citing the notion, discussed in the call, that the symbiont is the host in a symbiotic relationship. Is this - in fact - the position taken in the paper? This seems inconsistent with your comment above, so this is likely just a misunderstanding, thus the need for precision here.

In your comment above, it seems that the symbiont is (then) the smaller organism in a symbiotic relationship (again, not a necessary constraint in most definitions).

pbuttigieg commented 3 years ago

@only1chunts if there is a position taken by the proposing group on how they use symbiont, and the CIG verifies that this can be coherently applied across the package, a line clarifying it should be appended to the description/definition of each term that pertains to a symbiont.

ndheilly commented 3 years ago

We are dealing with a russian doll system, we have a host, that host a symbiont, that can itself host even smaller microbes. The host is the host of the symbiont. the symbiont is the host of other microbes. The host-associated package is appropriate to characterize the microbiome of the host. The symbiont-associated package would be used for the samples aiming at characterizing the microbiome of the symbiont. In discussion with other GSC members, it has been agreed that we need to keep as many of the "terms" and associated definition from other packages. Hence, when someone is working with the SA package, reference to the host will actually be references to the symbiont. When referencing the host of the symbiont, we use the wording "host of the symbiont". Please send me an email and I can share with you the manuscript for clarification of the context and definitions used.

pbuttigieg commented 3 years ago

Thanks @ndheilly - let's try to unpack this:

We are dealing with a russian doll system, we have a host, that host a symbiont, that can itself host even smaller microbes.

This is fine, if we use the generic definitions which seem to be broadly accepted - if they are all in symbiotic relationships with one another, they are all symbionts, some of which play host to others.

The host is the host of the symbiont. the symbiont is the host of other microbes.

This is where the ambiguity starts: If we were to read this strictly, this effectively chooses the "middle Matryoshka" as the symbiont.

The notion that these organisms are all symbionts of one another is more coherent. The host role is then defined elsewhere (see below).

The host-associated package is appropriate to characterize the microbiome of the host.

Yes, this package was designed for a simpler scenario: organism + its microbiome, where the organism plays the role of the host. This relationship, itself, can be considered a type of (usually mutualistic for a healthy microbiome) symbiosis.

The symbiont-associated package would be used for the samples aiming at characterizing the microbiome of the symbiont.

This is again where it gets a little iffy: the symbiont here is ambiguous - does this mean any organism in a symbiotic relationship? or just the symbiont that plays host to the other symbiont? or the symbiont that is hosted?

In discussion with other GSC members, it has been agreed that we need to keep as many of the "terms" and associated definition from other packages.

This is a general policy of reuse, yes.

Hence, when someone is working with the SA package, reference to the host will actually be references to the symbiont.

Indeed, but which one? If we have a symbiosis between a buffalo and an oxpecker, both of those can have microbiomes that can be described by the host-associated package.

The case where using the host-associated field works well/unambiguously is when one of the symbionts is (part of) the microbiome of the other that plays host to it. But I don't think this package is scoped only to this type of relationship, right?

When referencing the host of the symbiont, we use the wording "host of the symbiont".

As above, this doesn't resolve the ambiguity, as the host is also a symbiont by most general definitions.

Please send me an email and I can share with you the manuscript for clarification of the context and definitions used.

Thank you, but discussions on this must be open on this tracker to honour the GSC's commitment to openness and transparency.

ndheilly commented 3 years ago

There is indeed a general international and cross-disciplinary problem with the definition of the work “symbiont”. Clearly, the issue you raise is rather a systemic problem across fields of medical science, ecology and veterinary science: indeed, from one field to another, definitions of the word "symbiont" are in direct conflict and contradict other definitions of the word "symbiont" in related field, The host is only considered a "symbiont" in the field of ecology. This is why the boundaries of the usage of the SA package and the definition chosen is clearly stated in the manuscript: we took care to ensure that researchers across fields would be able to understand and agree with the chosen definition and boundaries.

pbuttigieg commented 3 years ago

Thanks @ndheilly - we would need the operational definition your consortium uses clearly stated here so we can verify that it works with the logic of the standard specification.

While there is no way to gauge how accepted that definition is until the paper is published and discussed in the literature, your consortium (as the proponents of the package) are free to take a position which the broader community can revise in future releases if there is a need to do so.

But we do need at least the definition and (highly preferred) some supporting argumentation around it made public here before we can write sound standard specifications.

@only1chunts @lschriml @ramonawalls this speaks to a procedural issue we should discuss more broadly in the CIG.

FatimaJorge commented 3 years ago

@pbuttigieg from our understanding the MIxS environmental packages were designed to "standardize sets of measurements and observations describing particular habitats” (Yilmaz et al 2011)

There are packages for samples originating from general biological habitats such as plants and humans. Surely you could also use the host associated package from samples collected from those habitats/hosts. But there was a need to further develop additional packages to better describe samples originating from those habitats. We have followed the same rational, in fact for a far more complex type of habitat. The symbiont associated package targets samples originating from a complex group of organisms ecologically defined as symbionts. Symbionts are organisms that can establish associations with other organisms which could be mutualistic (mutually beneficial association), commensal (beneficial association to one of the partners, but not harmful to the other), or parasitic (association detrimental to one of the partners). Such associations have been defined as host-symbiont associations. We are all aware of conflicts in definitions, especially now with the inclusion of microbes. But our package will be very useful for users who are familiar with these terminology.

In the proposed symbiont associated package the habitat/host of the samples are such organisms that establish associations with other organisms, I.e. symbionts. Using your example: "If we have a symbiosis between a buffalo and an oxpecker, both of those can have microbiomes that can be described by the host-associated package." Yes, you could potentially use the host associated package to describe samples collected from oxpecker. However, you could potentially miss a huge piece of information on the sources of microbes if in your metadata you do not add information on the buffalo. If you find it important to incorporate information on the locality (in fact it is a mandatory term), why not on the species interactions? Another example: How could you fully characterise a sample originating from the malaria parasite Plasmodium using simply the terms in host associated package? Can you fully describe the habitat of those samples? Wouldn't you find it necessary to add information not only of the host of the sample which is a symbiont, but also of the host of the symbiont, e.g. human, or a vector? We believe that this is where the symbiont associated package is a necessary addition to the existing ones.

What do we mean by symbiont? In the context of the symbiont-associated package, the term symbiont applies to macro and microorganisms that can establish a direct association with at least one other organism at some stage of their life cycle regardless of the nature and dependence of the interaction. A very important concept is the nestedness of symbiont-associate microbiota within and across host-symbiont-microbe interactions. This is in fact the backbone of the system we are characterising, the habitat of the sequenced sample is a symbiont. We are aware that the sample which was sequenced may include microbes that could be classified as symbiont too, but they are not the host of the sample, they are the sample itself!! So while in ecology we would have host-symbiont-microbe in the package we have host of the symbiont (=host of the host of the sample) - symbiont ( = host of the sample/habitat of the sample) - sample!!

Yes it is complex, but we believe we have it clearly defined and is highly valuable.

only1chunts commented 3 years ago

Like most people on this thread, I have more questions than I do answers, but perhaps we can try to list some specific examples, to see if we agree as to which would actually use the new SA package?

sample description checklist package
metagenome sequences from a Human blood sample (may or maynot contain p.falciparum) mims human-associated
metagenome sequences from a whole wild caught Mosquito (may or maynot contain p.falciparum) mims host-associated
metagenome sequences from a water sample where malaria mosquito are present mims water
Sequences of a single p.falciparum cell isolated from human blood sample migs_eu human-associated
Sequences of a single p.falciparum cell isolated from a Mosquito migs_eu host-associated
metgenome sequences from E. vermicularis (pinworm) isolated from human intestine mims symbiont-associated
metgenome sequences from E. vermicularis (pinworm) from a non-host environment mims host-associated
metagenome sequences from gut of human (with pinworms) mims human-associated

Can anyone add a couple more examples of a sequence sample that might use the SA package, please?

NB - I've taken the view that unless the "symbiont" has been directly isolated from a specific host prior to sampling then you cannot add specific information about the host of the symbiont.

If you disagree with the above table please explain where I have made a mistake and why? I foresee contention in the last item on that list- my argument for using human-associated instead of SA package is that the infection^ of the human host can be included in the sample metadata using the newly added term "observed host symbionts" (^-I'm not sure if infection is the right word here? but hopefully the meaning is understood)

ndheilly commented 3 years ago

No worries. here is the revision (I have no idea how to make a table in here)

Sequences of a single p.falciparum cell isolated from human blood sample | migs_ba | symbiont-associated --> remember that p. falciparum may host microbes itself. Sequences of a single p.falciparum cell isolated from a Mosquito | migs_ba | symbiont-associated --> exactly the same context as above metagenome sequences from E. vermicularis (pinworm) isolated from human intestine | mims | symbiont-associated Yes. metagenome sequences from E. vermicularis (pinworm) from a non-host environment | mims | symbiont-associated the pinworm is a known parasite and there is no ambiguity. It is indeed discussed in the paper that parasites may have free-living life stages and can be collected from the environment. We provides two examples (one with plants, one with a trematode with a complex life cycle) to show all the possibilities and emphasize the scientific interest in comparing the microbiome of a symbiont with the microbiome of water/host tissue/soil etc. in its direct environment.

I hope this helps.

Nolwenn

pyilmaz commented 3 years ago

I am a little confused regarding the choice migs_ba for P. Falciparum examples. If the Plasmodium cells are sequenced, shouldn’t the checklist be migs_eu?

only1chunts commented 3 years ago

@pyilmaz, Sorry my bad, you are correct P. Falciparum is not a bacteria, its single cell euk so should infact use migs_eu (now corrected in table above)

only1chunts commented 3 years ago

Thanks @ndheilly, this is interesting to see where we differ in point of view on sampling. I think text discussion here is not going to resolve anything so hopefully we can discuss this more on the zoom call tomorrow and add notes here afterwards. FYI, if you are interested about how to insert a table, see guide here.

pbuttigieg commented 3 years ago

Thanks @FatimaJorge

In the context of the symbiont-associated package, the term symbiont applies to macro and microorganisms that can establish a direct association with at least one other organism at some stage of their life cycle regardless of the nature and dependence of the interaction.

This is a good starting point and conforms with the common notion of "symbiont". I think we can generalise "macro and microorganisms" to "organisms".

The one point of ambiguity is the "direct association" - what is that? Predation is also a direct association.

@only1chunts thanks for the breakdown, I'm seeing much the same as you do. The "direct association" ambiguity is likely why there's be divergence from @ndheilly's analysis.

NB - I've taken the view that unless the "symbiont" has been directly isolated from a specific host prior to sampling then you cannot add specific information about the host of the symbiont.

I'd follow that logic too - you first have to know (or have strong reason to believe) that what you've isolated was in symbiosis with the host you isolated it from. Both are symbionts, following the definition above.

Sequences of a single p.falciparum cell isolated from human blood sample | migs_ba | symbiont-associated --> remember that p. falciparum may host microbes itself.

I'm not sure where the microbes come in to that case. The issue is whether the P. falciparum is in symbiosis with the human, rather than just around in the system.

metagenome sequences from E. vermicularis (pinworm) from a non-host environment | mims | symbiont-associated the pinworm is a known parasite and there is no ambiguity.

@only1chunts if this is the logic - then it seems that any organism known to engage in symbiotic relationships would qualify. This doesn't sit well my side, as the metadata package is not necessarily relevant to the sampling environment. If it was a host (not necessarily a symbiont) then the host-associated would be appropriate. If it was isolated from some water, then that package would be relevant, etc. Somewhat confusingly, this thinking seems to be in line with:

It is indeed discussed in the paper that parasites may have free-living life stages and can be collected from the environment. We provides two examples (one with plants, one with a trematode with a complex life cycle) to show all the possibilities and emphasize the scientific interest in comparing the microbiome of a symbiont with the microbiome of water/host tissue/soil etc. in its direct environment.

If there are free-living stages, and the organism (a potential symbiont) is sampled from an environment and is sequenced, the relevant environmental package(s) describing the environment of the organism should be used over the symbiont-associated package for the organism.

However, this case focuses on the microbiome of the organism - the assertions of this case appear to be: if one sequences the microbiome of an organism which is known to enter (at some stage) a symbiotic relationship with another, then the symbiont-associated package should be used.

This echoes @FatimaJorge's original logic

Using your example: "If we have a symbiosis between a buffalo and an oxpecker, both of those can have microbiomes that can be described by the host-associated package." Yes, you could potentially use the host associated package to describe samples collected from oxpecker. However, you could potentially miss a huge piece of information on the sources of microbes if in your metadata you do not add information on the buffalo. If you find it important to incorporate information on the locality (in fact it is a mandatory term), why not on the species interactions?

In this set up, we have an oxpecker (O), buffalo (B), and their respective microbiomes (o,b). The argument is that to understand o, you need to understand that O is in a symbiotic relationship with B (and thus, b), in a similar way you need to understand the environment of O.

Thus, because O is a symbiotic relationship with B, the symbiont-associated package would be used over the host-associated package in order to gather more information on B, as its association with O is so close that that o and b are likely persistently impacted.

We then go back to the definition itself, and need some sort of threshold on "direct association" - when should a MIxS user choose the symbiont-associated package over the host-associated package?

Another example: How could you fully characterise a sample originating from the malaria parasite Plasmodium using simply the terms in host associated package? Can you fully describe the habitat of those samples? Wouldn't you find it necessary to add information not only of the host of the sample which is a symbiont, but also of the host of the symbiont, e.g. human, or a vector? We believe that this is where the symbiont associated package is a necessary addition to the existing ones.

In this example - I assume we're talking about the microbiome (p) of the Plasmodium (P). Note that - in a minimal metadata checklist - there is no attempt to fully characterise the environments of samples, just the kernel information profile, which the expert base helps define.

I would find it necessary to add information on the host of P if and only if P is currently in a symbiotic relationship with such a host.