Open anuj2054 opened 8 years ago
Hi, I would like to second that proposal, since I'm in an identical situation, although my dataset goes into ~200 genomes and 50 references genes. Running a full all-vs-all with BLAST takes forever, diamond works better, but it still takes a lot of time, so right now I resorted to scripting around running pairwise OrthoFinder analyses between my reference sequences and each of the genomes, and then collating the results together, which feels somewhat like a dirty hack. I hope such a feature could relatively easily be incorporated into the program? Thank you for your consideration!
Cheers, -Jacek
Hi Anuj and Jacek
I plan to add something that should help you do this. Currently I'm working on the paper that will describe the new OrthoFinder functionality since the first version (trees and orthologues) so it won't be until later this year before I can start work on development so unfortunately this may be too late for you, but it will be coming!
All the best David
Hi everyone and David,
Given that this issue is not closed and I have to do a similar job as described by Anuj and Jacek, I am wondering if anyone (David) has an idea about best practices to do when you have a short list of sequences (proteins) and would like to look for orthologous against hundreds of proteomes (gene models from genomes and predicted proteins from transcriptomes). I would appreciate ahy help on this matter.
Thanks in advance, Felipe
Hi
I think I would take one of two approaches:
I hope this helps,
All the best David
Hi David,
Thanks for your sharing your thought with us about how to do this kind of analysis, I will try both of them.
To share with you (and others), I am blasting each proteome (more than 150 species, using diamond) against my reference proteins and will use the blast output(s) in OrthoFinder. I have not finished the blasting step but it should be done soon.
Regards,
Felipe
I am not sure if this comment corresponds to this topic however, is there a way to perform a Bi-directional Best Hit analyses against a reference genome using orthofinder? I am trying to do a constraint analyses of many genomes against a reference in order to obtain a data set that shows the presence or absence of a certain protein (belonging to the reference) in each of the genomes.
Thanks
Hi, I am not sure if this issue is still open. Another possible implementation will be to use OrthoDB with known relationship between the proteins-orthogroups and than align the proteomes against the OGs. Is anyone willing to implement something similar? My genome annotations are quite draft so this approach will be quite useful for me.
@faguil how did @davidemms 's suggestion work out for you?
I'm dealing with a similar situation (~20 transcriptomes -> proteomes) and a bunch of proteins of interest. What I've done is just add the proteomes that contain these proteins of interest (e.g., D. melanogaster) as an input alongside my sample proteomes to OrthoFinder
. Then I simply pulled out all pairwise orthologs corresponding to the proteins of interest from the Orthologues/ directory
.
Can anyone comment if this is an acceptable solution?
Hi @vragh,
I finally did the same approach as you said, and after some parsing of the otrhofinder output, it worked for me. This was done a while ago, so I do not have the script used on my hands. Sorry about that.
Best
Hello, I have a set of 100 UniRef proteins. I want to find orthologs of this set of proteins against a set of 44 transcriptomes. Would OrthoFinder help me in this ? I know OrthoFinder can find ortholog groups amongst the 44 transcriptomes themselves, but i need orthologs of he 100 Uniref proteins in particular for a phylogenetics study. Thanks, Anuj, University of Oklahoma