GDKO / AvP

Automatic evaluation of HGTs
GNU General Public License v3.0
18 stars 2 forks source link

HI, i wanna know whats the meaning of "Ingroup" and "EGP" #23

Open leech1225 opened 2 weeks ago

leech1225 commented 2 weeks ago

As you said, you want to find HGTs from "Non Metazoa" species to your species Your example: Ingroup: 33208: Metazoa EGP: 6300: Tylenchida i check that "Metazoa" ranks "kingdom" with "Tylenchida" ranks lower than "kingdom" But the "Ingroup" is for which is the target of the HGT and "EGP" is for which taxonomic groups to exclude from calculations Whats my question: the rank of "Ingroup" is whether "not investigation" or "investigation" My output shows that:"Ingroup" means "not investigation rank" which contradicts your explanation.

leech1225 commented 2 weeks ago

my config: Ingroup: 2157: Archaea EGP: 1590: Lactiplantibacillus plantarum i want to know the HGT from archaea, so i set it (Lactiplantibacillus plantarum is a species of Bacteria) but after ai, i found the donars are Bacteria too(not Archaea)..

leech1225 commented 2 weeks ago

Thanks for your wonderful toolkit, i want to figure out any HGT from Non-Bacteria and lower ranks of Bacteria, how should i set the group.yaml?

GDKO commented 2 weeks ago

Hi @leech1225,

Ingroup is for finding donors outside of this rank, EGP is to exclude this rank in HGT calculations.

Check the following comment for more information.

If you need any further help reply with your species name and I can give more specific examples.

Cheers, Georgios

leech1225 commented 2 weeks ago

Thanks for your help captain @GDKO what still puzzled me is that "Ingroup" and "EGP" all means "exclude", but what the difference between them? for example,whats the difference between two config following? 1: ingroup: 2: Bacteria EGP: 1590: Lactiplantibacillus plantarum

2: ingroup: 1590: Lactiplantibacillus plantarum EGP: 2: Bacteria

Glad to talk with you through the comment 🙏

lagphase commented 2 weeks ago

Dear @GDKO,

I'm still confused from reading this thread and the one you mentioned #15. If "EPG" is to exclude this rank in HGTs, then why

If you set exclude to Saccharomyces and ingroup to Saccharomycetaceae you are searching for HGTs present in S.cerevisiae and maybe in other species from the genus Saccharomyces but absent in the other genera of Saccharomycetaceae

Or in your example:

In the following example we have proteins from the nematode Meloidogyne incognita and we want to find HGTs from Non Metazoa species to our species. For that we set Ingroup to Metazoa and EGP to the suborder Tylenchida which our species belongs to, to allow for HGTs that may be present also in other Tylenchida species

It seems to me that HGTs are searched in the EPG group and exclude the Ingroup....

leech1225 commented 2 weeks ago

As i tried in several groups.yaml, AvP seems to detect HGT which Exclude:Outside "ingroup" ranks and Under "EGP" ranks

GDKO commented 2 weeks ago

Hi @leech1225 and @lagphase ,

Assuming that your proteins belong to Lactiplantibacillus plantarum

1: ingroup: 2: Bacteria EGP: 1590: Lactiplantibacillus plantarum

This will search for HGTs with donors outside of Bacteria that are present in Lactiplantibacillus plantarum but not in other Bacteria.

2: ingroup: 1590: Lactiplantibacillus plantarum EGP: 2: Bacteria

This will search for HGTs in your species with donors outside of Bacteria that can also be present in other Bacteria. Here, you will not be able to distinguish between vertical or horizontal transmission

Think about it in this way. We want to distinguish between vertical and horizontal transfers.

Let's take as example the following. We have sequenced S. cerevisae that has the following taxonomy

phylum:Ascomycota;class:Saccharomycetes;order:Saccharomycetales;family:Saccharomycetaceae;genus:Saccharomyces We assume that if a protein in our species is more similar to proteins in other Ascomycota rather than outside of Ascomycota this is an indication for vertical transmission. In that case we set Ingroup to Ascomycota. Now, since S. cerevisae is in the Ascomycota phylum we can set EGP to different ranks depending on our question.

  1. If we set EGP to S. cerevisae then we consider the HGT candidates that are more similar to proteins outside of Ascomycota rather than inside Ascomycota (excluding S. cerevisae). In other terms we are searching for HGTs that are only present in S. cerevisae from all Ascomycota.
  2. If we set EGP to Saccharomyces, we are searching for HGTs that are present in our species but may be also present in other Saccharomyces but not in other Ascomycota. This means that for some HGT candidates the transfer maybe happened in an ancestral Saccharomyces species and was vertically transmitted in our species.

Let's assume a protein was transferred from a bacterial species to S. cervisae.

Let's assume a protein was transferred from a bacterial species to the last common ancestor of Saccharomyces.

In all cases our species should be equal to or inside the EGP rank and the EGP rank should be inside the Ingroup rank, otherwise we will not be able to distinguish between vertical and horizontal transfers. Depending on the question, the user needs to specify Ingroup and EGP.

leech1225 commented 2 weeks ago

Hey@GDKO , Thanks for your devotion! I've figure it out!