davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
703 stars 188 forks source link

"A duplicate accession was found..." error #7

Closed drelo closed 8 years ago

drelo commented 9 years ago

Hi again, I met this error while running the phase 1 first

[I removed a bunch of the output to fit it here...]

1. Checking required programs are installed

Test can run "makeblastdb -help" - ok Test can run "blastp -help" - ok Test can run "mcl -h" - ok

2. Temporarily renaming sequences with unique, simple identifiers

Done

3. Dividing up work for BLAST for parallel processing

3a. Creating BLAST databases

4. Running BLAST all-versus-all

Maximum number of BLAST processer: 11 2015-10-19 12:30:51.644003 : This may take some time.... Done!

5. Running OrthoFinder algorithm

2015-10-22 02:51:58.743852 : Started 2015-10-22 02:51:59.966677 : Got sequence lengths 2015-10-22 02:51:59.966706 : Initial processing of each species


2015-10-22 07:47:18.320688 : Writen final scores for species 15 to graph file [mclIO] reading <sixteen/Results_Oct19/WorkingDirectory/OrthoFinder_v0.2.8_graph.txt> ....................................... [mclIO] read native interchange 517135x517135 matrix with 7956242 entries [mcl] pid 27111 ite ------------------- chaos time hom(avg,lo,hi) expa expb expc fmv 1 ................... 121.04 5.24 1.00/0.03/9.22 2.88 2.52 2.52 0


42 ................... 0.00 0.13 1.00/1.00/1.00 1.00 1.00 0.08 0 [mcl] cut <14> instances of overlap [mcl] jury pruning marks: <93,92,94>, out of 100 [mcl] jury pruning synopsis: <92.9 or scrumptious> (cf -scheme, -do log) [mclIO] writing <sixteen/Results_Oct19/WorkingDirectory/clusters_OrthoFinder_v0.2.8_I1.5.txt> ....................................... [mclIO] wrote native interchange 517135x99346 matrix with 517135 entries to stream <sixteen/Results_Oct19/WorkingDirectory/clusters_OrthoFinder_v0.2.8_I1.5.txt> [mcl] 99346 clusters found [mcl] output is in sixteen/Results_Oct19/WorkingDirectory/clusters_OrthoFinder_v0.2.8_I1.5.txt

Please cite:

2015-10-19 12:30:52.010419 : Running command: blastp -outfmt 6 -evalue 0.001 -query


sixteen/Results_Oct19/WorkingDirectory/Blast10_4.txt 2015-10-20 13:31:28.710983 : Running command: blastp -outfmt 6 -evalue 0.001 -query sixteen/Results_Oct19/WorkingDirectory/Species4.fa -db sixteen/Results_Oct19/WorkingDirectory/BlastDBSpecies10 -out sixteen/Results_Oct19/WorkingDirectory/Blast4_10.txt 2015-10-20 16:23:13.283020 : Finished command: blastp -outfmt 6 -evalue 0.001 -query sixteen/Results_Oct19/WorkingDirectory/Species4.fa -db sixteen/Results_Oct19/WorkingDirectory/BlastDBSpecies10 -out sixteen/Results_Oct19/WorkingDirectory/Blast4_10.txt 2015-10-20 16:23:13.283062 : Running command2015-10-22 07:50:39.753291 : Ran MCL

6. Creating files for Orthologous Groups

When publishing work that uses OrthoFinder please cite: D.M. Emms & S. Kelly (2015), OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy, Genome Biology 16:157.

A duplicate accession was found using just first part: TR31292|c0_g1_i1|m.23117 Tried to use only the first part of the accession in order to list the sequences in each orthologous group more concisely but these were not unique. Will use the full accession line instead. Orthologous groups have been written to tab-delimited files: sixteen/Results_Oct19/OrthologousGroups.csv sixteen/Results_Oct19/OrthologousGroups_UnassignedGenes.csv And in OrthoMCL format: sixteen/Results_Oct19/OrthologousGroups.txt

And then as I run the phase to obtain the alignments:

OrthoFinder Alignments and Trees version 0.2.8 Copyright (C) 2015 David Emms

This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it under certain conditions.
For details please see the License.md that came with this software.

Generating trees for orthogroups in file: sixteen/Results_Oct19/OrthologousGroups.txt

Using 11 threads for alignments and trees

Traceback (most recent call last): File "trees_for_orthogroups.py", line 310, in idDict = GetIDsDict(orthofinderWorkingDir) File "trees_for_orthogroups.py", line 235, in GetIDsDict idExtract = orthofinder.FirstWordExtractor(orthofinderWorkingDir + "SequenceIDs.txt") File "/home/compartido2/andres/OrthoFinder/orthofinder.py", line 159, in init raise RuntimeError("A duplicate accession was found using just first part: % s" % accession) RuntimeError: A duplicate accession was found using just first part: TR31292|c0_g1_i1|m.23117

And then it stopped. How can I fix this? Thanks in advance

davidemms commented 8 years ago

Hi, duplicate accessions should have been dealt with when generating trees. I've fixed the script so that it now deals with this case correctly. To use this please download both the orthofinder.py and trees_for_orthogroups.py files.

drelo commented 8 years ago

Thanks, I am trying this new version right now. I will redo the analyses from the scratch then.