davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
682 stars 186 forks source link

Running OrthoFinder algorithm. Initial processing of each species ERROR: Blast0_0.txt is corrupted #729

Open cd791 opened 2 years ago

cd791 commented 2 years ago

Dear David,

I installed Orthofinder v2.5.4 via conda, however, when I run the ExampleData, I get the following error:

OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

2022-08-11 10:01:35 : Starting OrthoFinder 2.5.4 8 thread(s) for highly parallel tasks (BLAST searches etc.) 1 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i /home/cd791/orthofinder_tutorial/OrthoFinder/ExampleData/OrthoFinder/Results_Aug11_11/WorkingDirectory/SimpleTest.phy -o /home/cd791/orthofinder_tutorial/OrthoFinder/ExampleData/OrthoFinder/Results_Aug11_11/WorkingDirectory/SimpleTest.tre" - ok

Dividing up work for BLAST for parallel processing

2022-08-11 10:01:36 : Creating diamond database 1 of 4 2022-08-11 10:01:36 : Creating diamond database 2 of 4 2022-08-11 10:01:36 : Creating diamond database 3 of 4 2022-08-11 10:01:36 : Creating diamond database 4 of 4

Running diamond all-versus-all

Using 8 thread(s) 2022-08-11 10:01:36 : This may take some time.... 2022-08-11 10:01:36 : Done 0 of 16 2022-08-11 10:01:51 : Done all-versus-all sequence search

Running OrthoFinder algorithm

2022-08-11 10:01:52 : Initial processing of each species ERROR: Blast0_0.txt is corrupted Malformatted line in /home/cd791/orthofinder_tutorial/OrthoFinder/ExampleData/OrthoFinder/Results_Aug11_11/WorkingDirectory/Blast0_0.txt Offending line was:

ERROR: Error processing files Blast0_ Process Process-10: Traceback (most recent call last): File "/home/cd791/miniconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/cd791/miniconda3/envs/orthofinder/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(self._args, self._kwargs) File "/home/cd791/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits WaterfallMethod.ProcessBlastHits(args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast) File "/home/cd791/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast) File "/home/cd791/miniconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores for row in blastreader: _csv.Error: line contains NUL ERROR: An error occurred, please review the error messages*** they may contain useful information about the problem.

It also creates a diamond_output.txt.gz output in the ~/orthofinder_tutorial/OrthoFinder directory:

| gi|290752891|emb|CBH40866.1| 23.1 507 316 23 7 462 2 485 1.9e-09 56.2 gi|290752280|emb|CBH40251.1| gi|290752893|emb|CBH40868.1| 24.1 237 138 7 11 224 369 586 9.3e-09 53.9 gi|290752280|emb|CBH40251.1| gi|290752592|emb|CBH40564.1| 25.0 220 154 7 7 223 357 568 6.0e-08 51.2 gi|290752280|emb|CBH40251.1| gi|290752391|emb|CBH40362.1| 22.8 162 108 4 5 153 10 167 1.0e-07 50.4 gi|290752280|emb|CBH40251.1| gi|290752979|emb|CBH40955.1| 21.1 261 136 8 29 223 36 292 2.3e-07 49.3 gi|290752280|emb|CBH40251.1| gi|290752668|emb|CBH40641.1| 30.0 100 63 3 117 209 798 897 2.8e-05 42.4 gi|290752280|emb|CBH40251.1| gi|290752491|emb|CBH40463.1| 42.1 38 22 0 28 65 37 74 1.1e-04 40.4 gi|290752280|emb|CBH40251.1| gi|290752373|emb|CBH40344.1| 37.3 51 32 0 20 70 28 78 1.8e-04 39.7 gi|290752281|emb|CBH40252.1| gi|290752281|emb|CBH40252.1| 100.0 662 0 0 1 662 1 662 0.0e+00 1117.8 gi|290752282|emb|CBH40253.1| gi|290752282|emb|CBH40253.1| 100.0 325 0 0 1 325 1 325 2.8e-181 626.3 gi|290752283|emb|CBH40254.1| gi|290752283|emb|CBH40254.1| 100.0 220 0 0 1 220 1 220 5.3e-128 448.7 gi|290752283|emb|CBH40254.1| gi|290752284|emb|CBH40255.1| 50.7 213 104 1 1 213 1 212 1.1e-59 221.9 gi|290752284|emb|CBH40255.1| gi|290752284|emb|CBH40255.1| 100.0 217 0 0 1 217 1 217 1.2e-124 437.6 gi|290752284|emb|CBH40255.1| gi|290752283|emb|CBH40254.1| 50.7 213 104 1 1 212 1 213 4.0e-59 219.9 gi|290752285|emb|CBH40256.1| gi|290752285|emb|CBH40256.1| 100.0 621 0 0 1 621 1 621 0.0e+00 1138.6 diamond_output.txt

Any advice for these issues, please?

Thanks.

Best regards, Chiara

sdtruong commented 1 year ago

I'm also having this issue. Have you found a solution?

Edit 1:

After looking more into this issue, it appears to be something with diamond. I was able to partially fix it by doing conda install -c bioconda diamond=0.9.4.

However, I'm having another issue now. I'm unsure if it's related:

Reconciling gene trees and species tree
---------------------------------------
Outgroup: Mycoplasma_hyopneumoniae
2022-10-22 23:57:40 : Starting Recon and orthologues
2022-10-22 23:57:40 : Starting OF Orthologues
Traceback (most recent call last):
  File "/Users/user/opt/anaconda3/envs/longenv/bin/Orthofinder", line 7, in <module>
    main(args)
  File "/Users/user/opt/anaconda3/envs/longenv/bin/scripts_of/__main__.py", line 1778, in main
    GetOrthologues(speciesInfoObj, options, prog_caller)
  File "/Users/user/opt/anaconda3/envs/longenv/bin/scripts_of/__main__.py", line 1540, in GetOrthologues
    orthologues.OrthologuesWorkflow(speciesInfoObj.speciesToUse, 
  File "/Users/user/opt/anaconda3/envs/longenv/bin/scripts_of/orthologues.py", line 1090, in OrthologuesWorkflow
    ReconciliationAndOrthologues(recon_method, db.ogSet, nHighParallel, nLowParallel, i if qMultiple else None, stride_dups=stride_dups, q_split_para_clades=q_split_para_clades) 
  File "/Users/user/opt/anaconda3/envs/longenv/bin/scripts_of/orthologues.py", line 870, in ReconciliationAndOrthologues
    nOrthologues_SpPair = trees2ologs_of.DoOrthologuesForOrthoFinder(ogSet, species_tree_rooted_labelled, trees2ologs_of.GeneToSpecies_dash, 
  File "/Users/user/opt/anaconda3/envs/longenv/bin/scripts_of/trees2ologs_of.py", line 1123, in DoOrthologuesForOrthoFinder
    nOrthologues_SpPair = RunOrthologsParallel(ta, len(ogSet.speciesToUse), args_queue, n_parallel)
  File "/Users/user/opt/anaconda3/envs/longenv/bin/scripts_of/trees2ologs_of.py", line 1276, in RunOrthologsParallel
    proc.start()
  File "/Users/user/opt/anaconda3/envs/longenv/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/user/opt/anaconda3/envs/longenv/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/user/opt/anaconda3/envs/longenv/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/Users/user/opt/anaconda3/envs/longenv/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/user/opt/anaconda3/envs/longenv/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/user/opt/anaconda3/envs/longenv/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/user/opt/anaconda3/envs/longenv/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

Edit 2: Seems like it was another issue with version(s). I downgraded Python to python=3.7 to fix the above issue, and I was able to get everything to work. I hope this comment helps!

cd791 commented 1 year ago

Hi sdtruong,

I will explained how I fixed my issue:

Orthofinder was built for python v2.7 -> incompatible for python 3.10 installed on hydrogen cluster

refers to the links below

https://github.com/davidemms/OrthoFinder/issues/328 https://github.com/bioconda/bioconda-recipes/pull/20155/files

Check python version, create py27 env, and install python v2.7.6

python --version Python 3.10.4

Create conda env for py27 (set small conda env for each pkgs to keep tidy)

conda create -n py27

conda install -n py27 -c anaconda python=2.7.6 conda activate py27

Install suggested diamond v0.9.24 from Github

conda activate py27 conda install -c bioconda diamond=0.9.24

Install orthofinder on py27 env

conda install -n py27 -c orthofinder

Test orthofinder is properly installed

orthofinder -h

Run orthofinder on ExampleData

cd ~/orthofinder_tutorial/OrthoFinder orthofinder -f ExampleData/

results in /home/cd791/orthofinder_tutorial/OrthoFinder/ExampleData/OrthoFinder/Results_Aug12_1/WorkingDirectory/

VaninaTonzo commented 1 year ago

Hi David,

I am having the same problem. I tried what cd791 suggested but nothing is working.

I have 9 species and working with DNA, just in case that matter.

Best,

V

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i OrthoFinder/Results_Feb08/WorkingDirectory/SimpleTest.phy -o OrthoFinder/Results_Feb08/WorkingDirectory/SimpleTest.tre" - ok Traceback (most recent call last): File "/uufs/chpc.utah.edu/common/home/u6044365/.conda/envs/py27/bin/orthofinder", line 7, in main(args) File "/uufs/chpc.utah.edu/common/home/u6044365/.conda/envs/py27/bin/scriptsof/main.py", line 1781, in main speciesInfoObj, = ProcessPreviousFiles(files.FileHandler.GetWorkingDirectory1_Read(), options.qDoubleBlast) File "/uufs/chpc.utah.edu/common/home/u6044365/.conda/envs/py27/bin/scripts_of/main.py", line 1453, in ProcessPreviousFiles blast_fns_triangular = [files.FileHandler.GetBlastResultsFN(iSpecies, jSpecies) for iSpecies in speciesInfo.speciesToUse for jSpecies in speciesInfo.speciesToUse if jSpecies >= iSpecies] File "/uufs/chpc.utah.edu/common/home/u6044365/.conda/envs/py27/bin/scripts_of/main.py", line 1453, in blast_fns_triangular = [files.FileHandler.GetBlastResultsFN(iSpecies, jSpecies) for iSpecies in speciesInfo.speciesToUse for jSpecies in speciesInfo.speciesToUse if jSpecies >= iSpecies] File "/uufs/chpc.utah.edu/common/home/u6044365/.conda/envs/py27/bin/scripts_of/files.py", line 322, in GetBlastResultsFN raise Exception(fn + " not found") Exception: /uufs/chpc.utah.edu/common/home/gompert-group3/projects/troglo_SPY/03_consensus/OF_consensus/OrthoFinder/Results_Feb08/WorkingDirectory/Blast0_0.txt not found

cst-ramirez commented 1 year ago

I have the same problem running orthofinder 2.5.4 on WSL2. For the most part, sdtruong's explanation works for me. The problem seems to originate here:

File "/home/<username>/miniconda3/envs/orthofinder/bin/scripts_of/newick.py", line 208, in read_newick nw = open(newick, 'rU').read()

Since 'U' is not a read mode in python versions above 3.7, downgrading to python 3.7.12 (the most recent version pf python 3.7 at the time of writing) worked for me to solve this problem. No need to downgrade diamond or orthofinder, I was able to run orthofinder 2.5.4 and diamond 2.1.6 (both the most recent versions at the time of writing).

Alternatively you could try fixing the source code to remove 'rU' as a read mode, but I was too lazy.