Closed Cronzo closed 5 years ago
Hi
It looks like RAxML has failed on the species tree inference step. Have a look at the file WorkingDirectory/Alignments_ids/RAxML_info.SpeciesTree and see what that says.
All the best David
p.s. I'd be interested to know what went wrong with the species tree.
In terms of continuing the run, a few contrary suggestions to consider first.
To finish the trees manually and the get OrthoFinder to do the orthologue inference step from the trees, you'd need to set everything up:
raxmlHPC-AVX -m PROTGAMMALG -p 12345 -s Alignments_ids/OG0000010.fa -n OG0000010 -w Alignments_ids
All the best David
Hi David,
Thank you for the response and suggestions!
Regarding the error message: There was no 'RAxML_info.SpeciesTree' generated; I could only find the respective 'RAxML_info.OG00*' files.
I performed a run using the test data with the same settings (from orthogroups), and had no issue generating a species tree. I should also mention that since I am working with plant genomes, the run did take a good amount of time (450-500 hours) before I encountered this error.
I believe I also failed to mention that my Tree_ids/ directory is already filled with the respective "OG00*_tree_id.txt" files.
In any case, I did follow your instructions and could see that approximately 20,000~/28,000 orthogroups have a RAxML_result file, and when I tried to run RAxML manually, I received the error message:
TOO FEW SPECIES
Since the later orthogroups usually contain >10~ species I figured this might mean that RAxML has finished processing the files (whatever it could process)?
If you do not mind me asking one more question: I am a little confused regarding (4); how do you root the species tree onto the outgroup? I see that there are several rooting algorithms in RAxML (pardon my unfamiliarity with RAxML): does OrthoFinder use the default rooting algorithm -f d (new rapid hill-climbing)
? And if so, how do I generate just one rooted tree with all the Tree_IDs?
Thank you again.
Hi
That's strange that it would say there are too few species. It should be able to do it as long as there are 4 or more genes present. Would you mind sending me an example?
As for rooting the species tree. First find the species tree file, I think it will be WorkingDirectory/SpeciesTree_unrooted_ids.txt if it was successfully inferred from RAxML. If RAxML wasn't successful then I think there will be the input multiple sequence alignment file for it here: WorkingDirectory/SpeciesTreeAlignment.fa
If you know the outgroup for your species (i.e. which individual species or group of species diverged earliest from the rest) then you can do it yourself, Dendroscope is a good tool for that. Use it to open the tree file, select the branch that should be the root, click the button which has a little red triangle pointing to a branch of the tree (this sets the root), then go to File > Export... and save it as a newick file.
If you don't know the outgroup then you can use STRIDE which is the method OrthoFinder uses. It's on my github page and takes as input the unrooted species tree and the Trees_ids/ directory.
All the best David
Hi David,
Thanks for the response again. You are right, those files had less than 4 genes.
This was the next file (by numerical sequence) from the last result file, so RAxML likely completed.
In any case, I managed to generate the rooted species tree using the WorkingDirectory/SpeciesTreeAlignment.fa
(as you suggested) and the rooting option in RAxML. I got the species tree, and renamed it to SpeciesTree_rooted_ids.txt
in the WorkingDirectory as you described above.
However, when I tried to continue the OrthoFinder run (-ft), I receive this strange error:
2019-04-05 06:34:23 : Starting OrthoFinder
5 thread(s) for highly parallel tasks (BLAST searches etc.)
1 thread(s) for OrthoFinder algorithm
Traceback (most recent call last):
File "/opt/OrthoFinder/orthofinder/orthofinder.py", line 1592, in <module>
scripts.files.InitialiseFileHandler(options, fastaDir, continuationDir, resultsDir_nonDefault, pickleDir_nonDefault)
File "/opt/OrthoFinder/orthofinder/scripts/files.py", line 825, in InitialiseFileHandler
FileHandler.CreateOutputDirectories(options, pfl, base_dir, fastaDir)
File "/opt/OrthoFinder/orthofinder/scripts/files.py", line 188, in CreateOutputDirectories
user_name=options.name)
File "/opt/OrthoFinder/orthofinder/scripts/files.py", line 153, in StartFromTrees
shutil.copy(speciesTreeFN, self.GetSpeciesTreeIDsRootedFN())
File "/usr/lib64/python2.7/shutil.py", line 119, in copy
copyfile(src, dst)
File "/usr/lib64/python2.7/shutil.py", line 68, in copyfile
if _samefile(src, dst):
File "/usr/lib64/python2.7/shutil.py", line 58, in _samefile
return os.path.samefile(src, dst)
File "/usr/lib64/python2.7/posixpath.py", line 162, in samefile
s1 = os.stat(f1)
TypeError: coercing to Unicode: need string or buffer, NoneType found
Command was: python orthofinder.py -ft ./OrthoFinder/Results_Mar02/ -t 5
It looks as if OrthoFinder was not able to locate some of the required files.
My current results folder has the following files from the previously terminated OrthoFinder run:
Gene_Trees/
(Empty)
Orthologues/
(Empty)
MultipleSequenceAlignments/
(Empty)
Orthogroup_Sequences/
(Contain the FASTA files of the various orthogroups)
WorkingDirectory
(Contains Alignment_ids/
, Tree_ids/
, Sequences_ids/
, SpeciesTree_rooted_ids.txt
)
What are the files that OrthoFinder need to read for the -ft command? I have tried moving the files from Tree_ids/ into Gene_Trees/ but the error still persists.
Thank you again for the help you have given thus far. I really appreciate it.
Cheers
Hi
I think what's probably happened is that because the trees didn't finish, OrthoFinder didn't get to the stage where it writes out to the Log.txt file where the trees can be found. I think it may be as simple as putting a line in the OrthoFinder/Results_Mar02/Log.txt file like this
WorkingDirectory_Trees: OrthoFinder/Results_Mar02/ WorkingDirectory/
but with the full path to the directory rather than a relative path.
Also, I think it's probably worth checking if the tree from RAxML is rooted and how accurate the root it. I don't think RAxML will really have available any good info to use to work out where the tree root is as virtually all substitution models are time-reversible. The STRIDE method will be a lot more accurate because it looks for gene duplication events in the gene trees instead. These are time-irreversible so give a good indication of the root. The '-f d' option looks like it's just for the tree search, it's not to do with the root. Better still, if you know what your outgroup is then it's easiest just to manually re-root it on this using Dendroscope.
All the best David
Hi everyone,
The OrthoFinder run I was performing encountered an error at the species tree inference step. The only logged error message was:
ERROR: An error occurred
ERROR: Species tree inference failed
Context: I was trying to construct a more 'accurate' species tree (of 50 species) using MSA and RAxML. The input came from a previous OrthoFinder run (-fg option using the BLAST output files). This error appeared after (last I checked, the gene tree inference step).
It appears that the MSA and RAxML results (RAxML_info/parsimony/results; OG0000000.fa etc.) were generated in the working directory, but all other directories (GeneTrees, MultipleSequenceAlignments, Orthologues) are empty.
Question: What could be the cause of the error? Due to constraints of time is there anyway I can continue the Orthofinder run (via RAxML or what not) using the the existing files in the working directory?
Thank you very much.