davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
692 stars 187 forks source link

Start OF from Gene Trees step with only a subset of orthogroups #488

Open pnatsi opened 3 years ago

pnatsi commented 3 years ago

Hi,

I want to run OrthoFinder from the Gene Trees step onwards. The reason is that I am only interested in the duplication history of a subset of orthogroups. However the default of OrthoFinder is to calculate gene trees for all orthogroups and this takes a lot of time, especially since I want to use iqtree instead of fasttree and especially for OG00000* orthogroups which are large.

My question is: is there a way to run Orthofinder from the Gene Trees step onwards with a set pre-computed gene trees. And if these pre-computed gene trees are only for a few orthogroups, can the analysis finish with only those? And most importantly, how can I set up OrthoFinder to use only these pre-computed trees, what will be the command and the input folder format?

Best, Paschalis Natsidis UCL

davidemms commented 3 years ago

Hi Paschalis

In general you are fine skipping orthogroups, but if you do this it is probably best to supply a species tree with the "-s" option, since OrthoFinder might have been relying on the orthogroups that you skipped to infer the species tree.

I think there are two ways you could approach this. The recommended way is to use the tree/MSA extensibility provided by the config.json file:

  1. Run OrthoFinder up to the orthogroups stage (command line switch "-og") and identify which orthogroups you are interested in.

  2. Write a wrapper script for the tree inference that takes an input and output filename. If the orthogroup is one you want then run the your chosen tree inference program on it and save the resulting tree to the output filename, otherwise skip it or use a fast tree method on it. If you skip it you don't need to create any output file.

  3. Add an entry in the config.json file for a "program_type": "tree" with the command line to call your wrapper script.

  4. Do the same for your alignments too if you would like to skip these.

  5. Run orthofinder from groups using the options "-fg RESULTS_DIR" and "-M msa -T YOUR_TREE_WRAPPER -A YOUR_MSA_WRAPPER -s SPECIES_TREE"

The alternative is to run OrthoFinder up to the point it writes the sequence files and then you run the alignments and trees yourself and put them where OrthoFinder expects:

  1. Run to sequences: "-os"

  2. Infer the alignments & trees you want on the files in "WorkingDirectory/Sequences_ids/". Save the trees in WorkingDirectory/Trees_ids/

  3. Start OrthoFinder from these trees: "-ft RESULTS_DIR -s SPECIES_TREE"

For both of these, I'd recommend testing the workflow out first on the Example Dataset.

All the best David

pnatsi commented 3 years ago

Hi David,

I tried running the second alternative, put my trees in WorkingDirectory/Trees_ids and ran orthofinder with -ft and -s

I got the following error:

OrthoFinder version 2.4.0 Copyright (C) 2014 David Emms

2020-12-15 09:39:16 : Starting OrthoFinder

40 thread(s) for highly parallel tasks (BLAST searches etc.)

1 thread(s) for OrthoFinder algorithm

Checking required programs are installed


Test can run "fastme -i /SAN/telfordlab/paratomella_et_al/tools/OrthoFinder/ExampleData/OrthoFinder/Results_Dec15/WorkingDirectory/SimpleTest.phy -o /SAN/telfordlab/paratomella_et_al/tools/OrthoFinder/ExampleData/OrthoFinder/Results_Dec15/WorkingDirectory/SimpleTest.tre" - ok

Running Orthologue Prediction

=============================

Reconciling gene and species trees


2020-12-15 09:39:16 : Starting OF Orthologues

Traceback (most recent call last):

File "orthofinder.py", line 7, in

File "scripts_of/main.py", line 1761, in main

File "scripts_of/main.py", line 1517, in GetOrthologues_FromTrees

File "scripts_of/orthologues.py", line 877, in OrthologuesFromTrees

File "scripts_of/orthologues.py", line 851, in ReconciliationAndOrthologues

File "scripts_of/trees2ologs_of.py", line 815, in DoOrthologuesForOrthoFinder

File "scripts_of/files.py", line 374, in GetOGsTreeFN

TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

[137708] Failed to execute script orthofinder

Any ideas what might have gone wrong?

Paschalis


From: David Emms notifications@github.com Sent: Wednesday, December 9, 2020 8:19 PM To: davidemms/OrthoFinder OrthoFinder@noreply.github.com Cc: Paschalis Natsidis pnatsidis@hotmail.com; Author author@noreply.github.com Subject: Re: [davidemms/OrthoFinder] Start OF from Gene Trees step with only a subset of orthogroups (#488)

Hi Paschalis

In general you are fine skipping orthogroups, but if you do this it is probably best to supply a species tree with the "-s" option, since OrthoFinder might have been relying on the orthogroups that you skipped to infer the species tree.

I think there are two ways you could approach this. The recommended way is to use the tree/MSA extensibility provided by the config.json file:

  1. Run OrthoFinder up to the orthogroups stage (command line switch "-og") and identify which orthogroups you are interested in.

  2. Write a wrapper script for the tree inference that takes an input and output filename. If the orthogroup is one you want then run the your chosen tree inference program on it and save the resulting tree to the output filename, otherwise skip it or use a fast tree method on it. If you skip it you don't need to create any output file.

  3. Add an entry in the config.json file for a "program_type": "tree" with the command line to call your wrapper script.

  4. Do the same for your alignments too if you would like to skip these.

  5. Run orthofinder from groups using the options "-fg RESULTS_DIR" and "-M msa -T YOUR_TREE_WRAPPER -A YOUR_MSA_WRAPPER -s SPECIES_TREE"

The alternative is to run OrthoFinder up to the point it writes the sequence files and then you run the alignments and trees yourself and put them where OrthoFinder expects:

  1. Run to sequences: "-os"

  2. Infer the alignments & trees you want on the files in "WorkingDirectory/Sequences_ids/". Save the trees in WorkingDirectory/Trees_ids/

  3. Start OrthoFinder from these trees: "-ft RESULTS_DIR -s SPECIES_TREE"

For both of these, I'd recommend testing the workflow out first on the Example Dataset.

All the best David

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavidemms%2FOrthoFinder%2Fissues%2F488%23issuecomment-741958893&data=04%7C01%7C%7C21ec2c36f5aa4d268f5408d89c6f0802%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637431347971062299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LZ%2Fu9zsb%2F7jiL2OW1A6Th%2BUCJzKQgDuDJQ797%2FhdA4g%3D&reserved=0, or unsubscribehttps://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEV4V77LFL375CEBOZWBM4TST65UVANCNFSM4USBDCWQ&data=04%7C01%7C%7C21ec2c36f5aa4d268f5408d89c6f0802%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637431347971082302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=b7xl%2Byp8yRnpeImj048ySklehMoAGYtFfcKkQWH9Rc4%3D&reserved=0.

davidemms commented 3 years ago

I think if you add a line to your Log.txt file like this:

WorkingDirectory_Trees: /home/emms/NOBACKUP/ExampleDataset/OrthoFinder/Results_STANDARD/WorkingDirectory/

pointing to the correct WorkingDirectory that might be enough. Have a look at what the Log.txt file looks like when you run it to completion on the Example Dataset. I've not tried this hack myself, but if you provide that extra info in the Log file that might be enough to get it to run.

All the best David

VaninaTonzo commented 2 years ago

Hi David and Pashcalis, I know this is and old issue but I don't know if it is solved.

I am trying to do the same thing:

  1. run OF with all my dataset till "-os"
  2. Infer the alignments, trees and in parallel my own species tree
  3. select some of the OGs and re run OF with these OG's gene trees and my species tree like these:

./orthofinder -ft /home/Desktop/Orthofinder/Results_dir/ -s /home/Desktop/Orthofinder/species_tree.txt -x /home/Desktop/Speciesinfofilename -t 8

Also, as you suggested, I change added to the Log.txt file generated in the first OF run the line : WorkingDirectory_Base: /home/Desktop/Orthofinder/Results_dir/WorkingDirectory/ WorkingDirectory_Trees:/home/Desktop/Orthofinder/Results_dir/WorkingDirectory/ Finally, I added -x to obtain the Orthoxml file too,

but I got this error:

OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

2022-01-27 13:09:10 : Starting OrthoFinder 2.5.4 8 thread(s) for highly parallel tasks (BLAST searches etc.) 1 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "fastme -i /home/vaninat/Desktop/Orthofinder/Results_Jul26_mod/WorkingDirectory/SimpleTest.phy -o /home/vaninat/Desktop/Orthofinder/Results_Jul26_mod/WorkingDirectory/SimpleTest.tre" - ok

Running Orthologue Prediction

Reconciling gene and species trees

2022-01-27 13:09:20 : Starting OF Orthologues Traceback (most recent call last): File "orthofinder.py", line 7, in File "scripts_of/main.py", line 1806, in main File "scripts_of/main.py", line 1562, in GetOrthologues_FromTrees File "scripts_of/orthologues.py", line 901, in OrthologuesFromTrees File "scripts_of/orthologues.py", line 871, in ReconciliationAndOrthologues File "scripts_of/trees2ologs_of.py", line 1104, in DoOrthologuesForOrthoFinder File "scripts_of/trees2ologs_of.py", line 1205, in AnalyseTree UnboundLocalError: local variable 'og_name' referenced before assignment [609114] Failed to execute script orthofinder

Any new suggestion??

Thanks in advance!!!

Vanina