davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
648 stars 185 forks source link

Cannot get alignments or gene trees from msa tree_inference_method #112

Closed frogriguez closed 6 years ago

frogriguez commented 6 years ago

Hello, Running the full analysis works (./orthofinder.py -f ExampleData -S diamond -t 16) However, I cannot get Orthofinder to give me the alignments or gene trees when the -M msa flag is used. Maybe an issue with dlcpar (v1.0)? Options tried:

/orthofinder.py -f ExampleData -t 16 -M msa
/orthofinder.py -f ExampleData -t 16 -M msa -T raxml
/orthofinder.py -f ExampleData -t 16 -S diamond -M msa

I always get errors during the species tree reconciliation:

Analysing each of the potential species tree roots
==================================================
Reconciling gene trees and species tree (root 0)
------------------------------------------------
...
Traceback (most recent call last):
  File "/usr/local/bin/dlcpar_search", line 209, in <module>
    sys.exit(main())
  File "/usr/local/bin/dlcpar_search", line 160, in main
    coal_trees = list(treelib.iter_trees(treefile))
  File "/usr/local/lib/python2.7/dist-packages/dlcpar/deps/rasmus/treelib.py", line 638, in iter_trees
    Traceback (most recent call last):
  File "/usr/local/bin/dlcpar_search", line 209, in <module>
infile = util.open_stream(treefile)
  File "/usr/local/lib/python2.7/dist-packages/dlcpar/deps/rasmus/util.py", line 1171, in open_stream
    sys.exit(main())
  File "/usr/local/bin/dlcpar_search", line 160, in main
    coal_trees = list(treelib.iter_trees(treefile))
  File "/usr/local/lib/python2.7/dist-packages/dlcpar/deps/rasmus/treelib.py", line 638, in iter_trees
    stream = open(filename, mode)
IOError: [Errno 2] No such file or directory: '/home/zach/src/OrthoFinder-1.1.8/ExampleData/Results_Aug29/Orthologues_Aug29/WorkingDirectory/dlcpar/OG0000005_tree_id.txt'
# This error is repeated for all OG's. and the rest of the analysis completes including:
Multiple sequence alignments:
   /home/zach/src/OrthoFinder-1.1.8_source/orthofinder/ExampleData/Results_Aug29_1/Orthologues_Aug29/Alignments

checking the directories and dlcpar errors:

ls -lh  /home/zach/src/OrthoFinder-1.1.8_source/orthofinder/ExampleData/Results_Aug29_1/Orthologues_Aug29/Alignments
# all files are size 0
ls ExampleData/Results_Aug29/Orthologues_Aug29/WorkingDirectory/dlcpar/
# none of the *tree_id.txt files exist
cat ExampleData/Orthologues_Aug29/WorkingDirectory/dlcpar/root_errors.txt
# /home/zach/src/OrthoFinder-1.1.8/ExampleData/Results_Aug29/Orthologues_Aug29/WorkingDirectory/Trees_ids/OG0000053_tree_id.txt: Unexisting tree file or Malformed newick tree structure.

I have tried running the complete analysis without -M msa and then feeding it the groups

# full analysis (completes without errors)
/orthofinder.py -f ExampleData -S diamond -t 16
ls ExampleData/Results_Sep05/Orthologues_Sep05/Gene_Trees/
# contains non-empty tree files
ls ExampleData/Results_Sep05/Orthologues_Sep05/WorkingDirectory/dlcpar/
# contains GeneMap.smap, *.coal.recon, *.coal.tree, *.daughters, *.locus.recon, *.txt

When I try to run the analysis from the groups: (/orthofinder.py -fg ExampleData/Results_Sep05/ -t 16 -M msa, I get similar errors: IOError: [Errno 2] No such file or directory: '/home/zach/src/OrthoFinder-1.1.8_source/orthofinder/ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/dlcpar/OG0000000_tree_id.txt

ls -lh ExampleData/Results_Sep05/Orthologues_Sep05_1/Alignments
# contains empty files
ls -lh ExampleData/Results_Sep05/Orthologues_Sep05_1/Gene_Trees
# empty

I have tried this with the binaries and the source_code version, both on the example dataset and on my own. I just can't get the alignments to complete. Any suggestions would be appreciated. ---Zach

davidemms commented 6 years ago

Hi Zach

The files that OrthoFinder will work from will be in the WorkingDirectory: "/home/zach/src/OrthoFinder-1.1.8_source/orthofinder/ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/". Could you check the files in:

  1. ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/Sequences_ids/
  2. ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/Alignments_ids/
  3. ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/Trees_ids/
  4. (I think you've already checked these ones) ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/dlcpar/OG*_tree_id.txt

My guess is that the problem is either occurring with creating the alignments from the sequence files or creating the trees from the alignments. Checking the directories should identify which.

All the best David

davidemms commented 6 years ago

The alignment command OrthoFinder will try to run is: mafft --localpair --maxiterate 1000 --anysymbol WorkingDirectory/Sequences_ids/OG0000000.fa > WorkingDirectory/Alignments_ids/OG0000000.fa

and the tree inference command is: FastTree WorkingDirectory/Alignments_ids/OG0000000.fa > WorkingDirectory/Trees_ids/OG0000000_tree_id.txt

so you can also check what happens if you try to run them and see if anything goes wrong.

David

frogriguez commented 6 years ago

Hi David, Thank you for your reply. The sequence files (1) exist, but the alignment files (2) and tree files (3) were empty. Also, the dlcp files did not exist.

However, I tried running the alignment command: mafft --localpair --maxiterate 1000 --anysymbol ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/Sequences_ids/OG0000000.fa > ExampleData/Results_Sep05/Orthologues_Sep05_1/WorkingDirectory/Alignments_ids/OG0000000.fa and was able to pinpoint the issue. I can run mafft just fine, but when orthofinder called it, it was having problems writing the files. I completely removed every version of mafft I could find and reinstalled. That seemed to fix it. I can now run the complete orthofinder analysis with the -M msa flag.

Thank you very much for your help! Zach