Open pandurang-kolekar opened 10 years ago
Open Tree doesn't have very many alignments at the moment - we have been focusing on trees. But, you could find trees with maximum overlap to your list of species and use that tree as a constraint in downstream analyses. @mtholder is also interested in updating existing phylogenies with new sequence data.
Thanks for suggestions! Trees with maximum overlap would be good start point. Based on the taxonomic proximity of the input species, other OTUs can be removed/retained in downstream analyses. I would like to discuss with @mtholder about his strategy towards the same.
I think this might be out of scope for opentree. There are some other tools that could be useful for this, like PHLAWD, phylota, and others you've mentioned such as treebase, etc.
Thanks for the information about PHLAWD @chinchliff . As far as my knowledge is concerned, Phylota archives only eukaryotic genera. If that can be linked to TreeBASE and other bacterial, viral databases it will help to broaden its scope.
@pandurang-kolekar You might be interested in PUmPER (http://sco.h-its.org/exelixis/web/software/put/index.html) from the Exelixis lab that seems to do just this.
Sorry. I had missed this thread. I'm happy to chat about this. I won't be at the hackathon in person, but will be participating remotely (from the Exelixis lab, in fact).
PUmPER (http://sco.h-its.org/exelixis/web/software/put/index.html) works on the similar principle. Thanks @alexharkess ! @mtholder I will explore the PUmPER then we can chat about this.
@alexharkess @mtholder I read the application note on PUmPER (http://bioinformatics.oxfordjournals.org/content/30/10/1476.long).
To summarize it allows user to create a multiple sequence alignment (MSA) from the scratch or extend the existing MSA using PHLAWD. This step requires the gene name(s) and NCBI taxonomic group as an input in configuration file for PHLAWD.
The MSA is then given as an input to ExaML or RAxML-Light to infer phylogenetic tree. The program can be run in standalone or remote mode using command line.
But I don't know whether it accepts the user provided sequence(s), which are not available in GenBank. I have sent an email to corresponding author of the PUmPER to inquire about this.
Its available for Linux OS only. Availability of user friendly server would be helpful for researchers having no/less computational background.
I didn't get any reply from the authors of PUmPER. @mtholder What are your views on this project idea?
Our aTRAM pipeline might be helpful for this too: it can generate multiple gene alignments across multiple taxa from whole genome shotgun reads.
I would like to propose the idea for "Pipeline for customized phylogeny based on user provided gene/protein sequence(s) using Open Tree of Life data".
Suppose a researcher has newly sequenced a gene/protein from a known species and wish to carry out phylogentic analysis of these sequences with existing orthologs in nearby taxonomic ranks (genus, family, order, class etc). This is a frequent lab exercise. In such cases, researcher compile and curate the ortholog sequence data from relevant databases (Genbank, ENA, Swissprot, Uniprot etc). Then add newly sequenced gene/protein to this data set and follow the molecular phylogeny analysis protocol. So every time a new gene/protein is sequenced one has to repeat this time consuming process of data curation, compilation and phylogeny.
I would like to propose an idea to expedite this process using the resources at Open Tree of Life.
This will help to characterize & annotate lab sequences and may even helps in species assignment or discovery of new species. It will save the time of such routine analyses.
Resources needed: TreeBASE, DRYAD, Arbor, Phylogeny packages (Bio::Phylo) etc. Experts may recommend few more.