OpenTreeOfLife / taxomachine

taxonomy graphdb
Other
7 stars 4 forks source link

contexts for microbes #43

Closed chinchliff closed 10 years ago

chinchliff commented 10 years ago

@hdliv, we can add TNRS contexts for microbial groups if you want to define some for me. Note, they must be monophyletic groups in the taxonomy in order for the contexts to make sense. I just need the official ott taxon names, and if the name of the context should be something different, I need that too. E.g. "Embryophyta" (taxon name) = "Land plants" (context name)

hdliv commented 10 years ago

@chinchliff Thanks! I don't know how many microbes are in there. Took a quick look but will have to spend more time later. However, we have talked about the other issue later that MANY names don't map to monophyletic clades... molecular data ahead of taxonomy. What is the solution for those that don't map to monophyletic groups...since it would be nice to see them to 'fix' them.

jar398 commented 10 years ago

Not sure what you mean since to me all clades are monophyletic by definition... Can you give examples? If the taxonomy has taxa that aren't clades, they should be removed and new classification added that is more likely to be consistent with phylogeny. If it's something that can be done automatically somehow, e.g. based on SILVA, we should think about scripting.

I gave a list of major eukaryote groups in a separate issue, I think in this repository. Those would be candidates for being TNRS contexts.

On Tue, Jun 3, 2014 at 2:53 PM, Dail Laughinghouse <notifications@github.com

wrote:

@chinchliff https://github.com/chinchliff Thanks! I don't know how many microbes are in there. Took a quick look but will have to spend more time later. However, we have talked about the other issue later that MANY, MANY names don't map to monophyletic clades... molecular data ahead of taxonomy. What is the solution for those that don't map to monophyletic groups...since it would be nice to see them to 'fix' them.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/43#issuecomment-45005376 .

hdliv commented 10 years ago

@jar398 Not sure we are speaking the same language.. ... What I mean is a way to visualize these 'yet to be monophyletic clades' that are being called the same names in more than one place in the tree, or polyphyletic clades....some examples of studies with trees that have what I talk about. These are common in all microbe studies, thus how do we visualize this?

doi: http://dx.doi.org/10.2216/11-32.1 http://fottea.czechphycology.cz/_contents/F11-1-2011-13.pdf http://dx.doi.org/10.11646/phytotaxa.163.5.1 DOI: http://dx.doi.org/10.11646/phytotaxa.163.4.2 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0040153 doi: 10.1128/AEM.69.9.5157-5169.2003

jar398 commented 10 years ago

I think I am saying "taxon" where you are saying "clade". A taxon can be nonmonophyletic, a clade can't. Anyhow I wonder if you could file this as a separate issue in the feedback repo. It has to do with treemachine and the web app, but not taxomachine. Thanks

On Tue, Jun 3, 2014 at 4:38 PM, Dail Laughinghouse <notifications@github.com

wrote:

@jar398 https://github.com/jar398 Not sure we are speaking the same language.. ... What I mean is a way to visualize these 'yet to be monophyletic clades' that are being called the same names in more than one place in the tree, or polyphyletic clades....some examples of studies with trees that have what I talk about. These are common in all microbe studies, thus how do we visualize this?

doi: http://dx.doi.org/10.2216/11-32.1 http://fottea.czechphycology.cz/_contents/F11-1-2011-13.pdf http://dx.doi.org/10.11646/phytotaxa.163.5.1 DOI: http://dx.doi.org/10.11646/phytotaxa.163.4.2 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0040153 doi: 10.1128/AEM.69.9.5157-5169.2003

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/43#issuecomment-45017130 .

hdliv commented 10 years ago

OK. I don't know which what has to do with what :) . Just wanted to make a comment about visualizing polyphyly since it is there all the time....

as to @chinchliff ... Do you need names like... SAR (Stramenopila, Alveolata, Rhizaria), Bacteria, Archaea, Excavata, Amoebozoa, Centrohelida, Haptophyta, Apusozoa, etc. ?

chinchliff commented 10 years ago

Yes. Contexts give users (and software) ways to limit the search scope for TNRS queries. They must be taxon names, like those you mentioned. It doesn't really make sense to have contexts for anything except very major groups, because they will just be distracting and don't offer an advantage. Should I make contexts for those groups you listed? Are there more? Do you want the display names to be the same as the taxon names or are there less technical names (e.g. "land plants" or "vertebrates") that would apply?

On Tuesday, June 3, 2014, Dail Laughinghouse notifications@github.com wrote:

OK. I don't know which what has to do with what :) . Just wanted to make a comment about visualizing polyphyly since it is there all the time....

as to @chinchliff https://github.com/chinchliff ... Do you need names like... SAR (Stramenopila, Alveolata, Rhizaria), Bacteria, Archaea, Excavata, Amoebozoa, Centrohelida, Haptophyta, Apusozoa ?

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/43#issuecomment-45019284 .

hdliv commented 10 years ago

Bacillariophyta = 'Diatoms'; Ciliophora = 'Ciliates'; Foraminifera = 'Forams';

chinchliff commented 10 years ago

Ok, so I am going to define the following taxa as TNRS contexts. One piece of secondary information that is useful (but not necessary) is what are the governing nomenclatural codes for these groups? If you know those, please let me know and I will add it.

SAR = "SAR group" Bacteria Archaea Excavata Amoebozoa = "Amoebozoans" Centrohelida Haptophyta Apusozoa Bacillariophyta = "Diatoms" Ciliophora = "Ciliates" Foraminifera

On Tue, Jun 3, 2014 at 11:05 PM, Dail Laughinghouse < notifications@github.com> wrote:

Bacillariophyta = 'Diatoms'; Ciliophora = 'Ciliates'; Foraminifera = 'Forams';

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/43#issuecomment-45047259 .

hdliv commented 10 years ago

@chinchliff Just so we are on the same page. These aren't all the Eukaryotic groups. We are missing others...such as Opisthokonta, Archaeplastida <- your favorite, etc.

Amoebozoans = "Amoebae" Foraminifera = "Forams"

As for governing nomenclatural codes: A little of a mess at the large groups... Bacteria (International Code of Nomenclature of Prokaryotes), except for Cyanobacteria (International Code of Nomenclature for algae, fungi, and plants & International Code of Nomenclature of Prokaryotes, however there has been a request for the latter Code to not reign over Cyanobacteria any more) Archaea (International Code of Nomenclature of Prokaryotes) Excavata (International Code of Nomenclature for algae, fungi, and plants & International Commission on Zoological Nomenclature) Amoebozoa (International Commission on Zoological Nomenclature) Centrohelida (International Commission on Zoological Nomenclature) Haptophyta (International Code of Nomenclature for algae, fungi, and plants & International Commission on Zoological Nomenclature) Apusozoa (International Commission on Zoological Nomenclature) Bacillariophyta (International Code of Nomenclature for algae, fungi, and plants & International Commission on Zoological Nomenclature) Ciliophora (International Code of Nomenclature for algae, fungi, and plants & International Commission on Zoological Nomenclature) Foraminifera (International Commission on Zoological Nomenclature) SAR (International Code of Nomenclature for algae, fungi, and plants & International Commission on Zoological Nomenclature)

chinchliff commented 10 years ago

Ok, I have added these to branch https://github.com/OpenTreeOfLife/taxomachine/tree/new_features. I have not added Opisthokonta or Archaeplastida as they contain large groups that are already separately indexed: Metazoa, Fungi, and Embryophyta. Anything not in those groups can still be found in the context of all life.

hdliv commented 10 years ago

before this is closed.....

the archaeplastida have many microbes... (microbial representatives of glaucophytes, greens, and reds)

But I wanted to go to this list that Laura had previously placed: Maybe it would be better to place each group separately:

Amoebozoa Archaeplastida, plus nested clades rhodophyceae, glaucophyta, chloroplastida Excavata Opisthokonta, plus nested clades Fungi and Metazoa SAR: plus nested clades Alveolata, Stramenopila, Rhizaria (Each of which could be nested but this may be sufficient, for now)

Archaea: might not need subnesting but here are some major clades Crenarchaeota Euryarchaeota Korarchaeota Nanoarchaeota Thaumarchaeota

Bacteria: might not need subnesting but here are some major clades Acidobacteria Actinobacteria Aquificae Armatimonadetes Bacteroidetes Candidate divisions Chlamydiae Chlorobi Chloroflexi Chrysiogenetes Cyanobacteria Deferribacteres Deinococcus-Thermus Dictyoglomi Elusimicrobia Fibrobacteres Firmicutes Fusobacteria Gemmatimonadetes Lentisphaerae Nitrospirae Planctomycetes Proteobacteria Spirochaetae Synergistetes TA06 Tenericutes Thermodesulfobacteria Thermotogae Verrucomicrobia

jar398 commented 10 years ago

The downside of adding lots of contexts - one could in principle add every taxon as a context - is that if an organism is misclassified, and you select a context that's the correct placement for that taxon but isn't where it is in the taxonomy, then you will never see that taxon.

Another downside is that the more contexts there are, the more names there are to read through for people trying to choose a context. As a user interface consideration you want to keep the number of taxa down to a bare minimum.

The only reason to add a context is to make homonym selection easier. If taxonomists followed the codes there would only be one context per highest taxon in each code. If you think there are going to be homonyms between the proposed contexts and other contexts, that would be a reason to include them.

On Thu, Jun 5, 2014 at 11:14 AM, Dail Laughinghouse < notifications@github.com> wrote:

before this is closed.....

the archaeplastida have many microbes... (microbial representatives of glaucophytes, greens, and reds)

But I wanted to go to this list that Laura had previously placed: Maybe it would be better to place each group separately:

Amoebozoa Archaeplastida, plus nested clades rhodophyceae, glaucophyta, chloroplastida Excavata Opisthokonta, plus nested clades Fungi and Metazoa SAR: plus nested clades Alveolata, Stramenopila, Rhizaria (Each of which could be nested but this may be sufficient, for now)

Archaea: might not need subnesting but here are some major clades Crenarchaeota Euryarchaeota Korarchaeota Nanoarchaeota Thaumarchaeota

Bacteria: might not need subnesting but here are some major clades Acidobacteria Actinobacteria Aquificae Armatimonadetes Bacteroidetes Candidate divisions Chlamydiae Chlorobi Chloroflexi Chrysiogenetes Cyanobacteria Deferribacteres Deinococcus-Thermus Dictyoglomi Elusimicrobia Fibrobacteres Firmicutes Fusobacteria Gemmatimonadetes Lentisphaerae Nitrospirae Planctomycetes Proteobacteria Spirochaetae Synergistetes TA06 Tenericutes Thermodesulfobacteria Thermotogae Verrucomicrobia

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/43#issuecomment-45232194 .

hdliv commented 10 years ago

The exhaustive list was not necessarily for context but better for listing codes of nomenclature for each.
For example...cyanobacteria are the only group within the bacteria that are governed by two Codes ... thus it is not the group 'Bacteria' that is governed by two Codes...but go to the next level

The same with Excavata. The Euglenoids are governed by ICN and the rest by ICZN. Anything that is broadly considered 'algae' but has close 'cousins' without photosynthesis already has a different Code. So it is better to go to another level....in my opinion

chinchliff commented 10 years ago

In general, the purpose of the contexts is to facilitate TNRS. They exist for the convenience of avoiding homonyms and for improving speed. It does not suit really that purpose to create more than we need. For groups with fewer than 100,000 species, the advantages to subdividing contexts are relatively small. Also, people have to navigate a list of the contexts to use them (in the GUI anyway), and it undermines their usefulness if the list is very long and difficult to browse. In fact for these reasons I may be removing some contexts from some of the other groups at some point.

It may be useful to provide contexts for some of the other large groups inside archeaplastida, but i do not think it will be particularly helpful to have archeaplastida as a context because (1) there are not many cases where the TNRS is going to be used to search across all plants and algae, and (2) because making nested contexts balloons the size of the database and affects overall speed and hard disk usage. Because of the way the curator app works, in fact, it is possible to perform multiple searches across non-overlapping contexts to get the same effect, and this is in many ways more efficient.

So, I will add glaucophytes, rhodophyta, and chlorophyta. If there are other groups that are not represented, which people are likely to want to use to limit the scope of their TNRS searches, let me know.

On Thursday, June 5, 2014, Dail Laughinghouse notifications@github.com wrote:

before this is closed.....

the archaeplastida have many microbes... (microbial representatives of glaucophytes, greens, and reds)

But I wanted to go to this list that Laura had previously placed: Maybe it would be better to place each group separately:

Amoebozoa Archaeplastida, plus nested clades rhodophyceae, glaucophyta, chloroplastida Excavata Opisthokonta, plus nested clades Fungi and Metazoa SAR: plus nested clades Alveolata, Stramenopila, Rhizaria (Each of which could be nested but this may be sufficient, for now)

Archaea: might not need subnesting but here are some major clades Crenarchaeota Euryarchaeota Korarchaeota Nanoarchaeota Thaumarchaeota

Bacteria: might not need subnesting but here are some major clades Acidobacteria Actinobacteria Aquificae Armatimonadetes Bacteroidetes Candidate divisions Chlamydiae Chlorobi Chloroflexi Chrysiogenetes Cyanobacteria Deferribacteres Deinococcus-Thermus Dictyoglomi Elusimicrobia Fibrobacteres Firmicutes Fusobacteria Gemmatimonadetes Lentisphaerae Nitrospirae Planctomycetes Proteobacteria Spirochaetae Synergistetes TA06 Tenericutes Thermodesulfobacteria Thermotogae Verrucomicrobia

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/43#issuecomment-45232194 .

chinchliff commented 10 years ago

Sorry for the duplicate message, I didn't see Jonathan's reply before I sent mine.

I noticed that glaucophyta, chlorophyta, and rhodophyceae are all quite small, so there is not much advantage to adding these as individual contexts.

On Thu, Jun 5, 2014 at 11:38 AM, Cody Hinchliff cody.hinchliff@gmail.com wrote:

In general, the purpose of the contexts is to facilitate TNRS. They exist for the convenience of avoiding homonyms and for improving speed. It does not suit really that purpose to create more than we need. For groups with fewer than 100,000 species, the advantages to subdividing contexts are relatively small. Also, people have to navigate a list of the contexts to use them (in the GUI anyway), and it undermines their usefulness if the list is very long and difficult to browse. In fact for these reasons I may be removing some contexts from some of the other groups at some point.

It may be useful to provide contexts for some of the other large groups inside archeaplastida, but i do not think it will be particularly helpful to have archeaplastida as a context because (1) there are not many cases where the TNRS is going to be used to search across all plants and algae, and (2) because making nested contexts balloons the size of the database and affects overall speed and hard disk usage. Because of the way the curator app works, in fact, it is possible to perform multiple searches across non-overlapping contexts to get the same effect, and this is in many ways more efficient.

So, I will add glaucophytes, rhodophyta, and chlorophyta. If there are other groups that are not represented, which people are likely to want to use to limit the scope of their TNRS searches, let me know.

On Thursday, June 5, 2014, Dail Laughinghouse notifications@github.com wrote:

before this is closed.....

the archaeplastida have many microbes... (microbial representatives of glaucophytes, greens, and reds)

But I wanted to go to this list that Laura had previously placed: Maybe it would be better to place each group separately:

Amoebozoa Archaeplastida, plus nested clades rhodophyceae, glaucophyta, chloroplastida Excavata Opisthokonta, plus nested clades Fungi and Metazoa SAR: plus nested clades Alveolata, Stramenopila, Rhizaria (Each of which could be nested but this may be sufficient, for now)

Archaea: might not need subnesting but here are some major clades Crenarchaeota Euryarchaeota Korarchaeota Nanoarchaeota Thaumarchaeota

Bacteria: might not need subnesting but here are some major clades Acidobacteria Actinobacteria Aquificae Armatimonadetes Bacteroidetes Candidate divisions Chlamydiae Chlorobi Chloroflexi Chrysiogenetes Cyanobacteria Deferribacteres Deinococcus-Thermus Dictyoglomi Elusimicrobia Fibrobacteres Firmicutes Fusobacteria Gemmatimonadetes Lentisphaerae Nitrospirae Planctomycetes Proteobacteria Spirochaetae Synergistetes TA06 Tenericutes Thermodesulfobacteria Thermotogae Verrucomicrobia

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/43#issuecomment-45232194 .