Closed drelo closed 8 years ago
Hi, duplicate accessions should have been dealt with when generating trees. I've fixed the script so that it now deals with this case correctly. To use this please download both the orthofinder.py and trees_for_orthogroups.py files.
Thanks, I am trying this new version right now. I will redo the analyses from the scratch then.
Hi again, I met this error while running the phase 1 first
[I removed a bunch of the output to fit it here...]
1. Checking required programs are installed
Test can run "makeblastdb -help" - ok Test can run "blastp -help" - ok Test can run "mcl -h" - ok
2. Temporarily renaming sequences with unique, simple identifiers
Done
3. Dividing up work for BLAST for parallel processing
3a. Creating BLAST databases
4. Running BLAST all-versus-all
Maximum number of BLAST processer: 11 2015-10-19 12:30:51.644003 : This may take some time.... Done!
5. Running OrthoFinder algorithm
2015-10-22 02:51:58.743852 : Started 2015-10-22 02:51:59.966677 : Got sequence lengths 2015-10-22 02:51:59.966706 : Initial processing of each species
2015-10-22 07:47:18.320688 : Writen final scores for species 15 to graph file [mclIO] reading <sixteen/Results_Oct19/WorkingDirectory/OrthoFinder_v0.2.8_graph.txt> ....................................... [mclIO] read native interchange 517135x517135 matrix with 7956242 entries [mcl] pid 27111 ite ------------------- chaos time hom(avg,lo,hi) expa expb expc fmv 1 ................... 121.04 5.24 1.00/0.03/9.22 2.88 2.52 2.52 0
42 ................... 0.00 0.13 1.00/1.00/1.00 1.00 1.00 0.08 0 [mcl] cut <14> instances of overlap [mcl] jury pruning marks: <93,92,94>, out of 100 [mcl] jury pruning synopsis: <92.9 or scrumptious> (cf -scheme, -do log) [mclIO] writing <sixteen/Results_Oct19/WorkingDirectory/clusters_OrthoFinder_v0.2.8_I1.5.txt> ....................................... [mclIO] wrote native interchange 517135x99346 matrix with 517135 entries to stream <sixteen/Results_Oct19/WorkingDirectory/clusters_OrthoFinder_v0.2.8_I1.5.txt> [mcl] 99346 clusters found [mcl] output is in sixteen/Results_Oct19/WorkingDirectory/clusters_OrthoFinder_v0.2.8_I1.5.txt
Please cite:
2015-10-19 12:30:52.010419 : Running command: blastp -outfmt 6 -evalue 0.001 -query
sixteen/Results_Oct19/WorkingDirectory/Blast10_4.txt 2015-10-20 13:31:28.710983 : Running command: blastp -outfmt 6 -evalue 0.001 -query sixteen/Results_Oct19/WorkingDirectory/Species4.fa -db sixteen/Results_Oct19/WorkingDirectory/BlastDBSpecies10 -out sixteen/Results_Oct19/WorkingDirectory/Blast4_10.txt 2015-10-20 16:23:13.283020 : Finished command: blastp -outfmt 6 -evalue 0.001 -query sixteen/Results_Oct19/WorkingDirectory/Species4.fa -db sixteen/Results_Oct19/WorkingDirectory/BlastDBSpecies10 -out sixteen/Results_Oct19/WorkingDirectory/Blast4_10.txt 2015-10-20 16:23:13.283062 : Running command2015-10-22 07:50:39.753291 : Ran MCL
6. Creating files for Orthologous Groups
When publishing work that uses OrthoFinder please cite: D.M. Emms & S. Kelly (2015), OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy, Genome Biology 16:157.
A duplicate accession was found using just first part: TR31292|c0_g1_i1|m.23117 Tried to use only the first part of the accession in order to list the sequences in each orthologous group more concisely but these were not unique. Will use the full accession line instead. Orthologous groups have been written to tab-delimited files: sixteen/Results_Oct19/OrthologousGroups.csv sixteen/Results_Oct19/OrthologousGroups_UnassignedGenes.csv And in OrthoMCL format: sixteen/Results_Oct19/OrthologousGroups.txt
And then as I run the phase to obtain the alignments:
OrthoFinder Alignments and Trees version 0.2.8 Copyright (C) 2015 David Emms
Generating trees for orthogroups in file: sixteen/Results_Oct19/OrthologousGroups.txt
Using 11 threads for alignments and trees
Traceback (most recent call last): File "trees_for_orthogroups.py", line 310, in
idDict = GetIDsDict(orthofinderWorkingDir)
File "trees_for_orthogroups.py", line 235, in GetIDsDict
idExtract = orthofinder.FirstWordExtractor(orthofinderWorkingDir + "SequenceIDs.txt")
File "/home/compartido2/andres/OrthoFinder/orthofinder.py", line 159, in init
raise RuntimeError("A duplicate accession was found using just first part: % s" % accession)
RuntimeError: A duplicate accession was found using just first part: TR31292|c0_g1_i1|m.23117
And then it stopped. How can I fix this? Thanks in advance