Closed erin-thei closed 1 year ago
Hi @erin-thei , The format of the orthogroups files is not good. There should not have the first line, lines should be only list of genes, with no comma. I guess you used Orthofinder to generate these HOGs. You can try to use the script I wrote on the agora_dev branch in src/import : https://github.com/DyogenIBENS/Agora/blob/dev/src/import/orthofinder_hogs/convert_hogs_sp.py
I didn't get the opportunity to try it through all the ancestral reconstruction process, therefore, I would greatly appreciate it if you could provide me feedbacks on that.
Hi @alouis72,
Thanks for your timely response. I will give that a try!
Since I'm new to this workflow, a couple of questions. Given my species tree, I was told to run OrthoFinder on all of the nodes (so I ran 68 iterations of OF). Each of those OF runs produced their own HOGs. Am I supposed to use that script for all of those? I guess I am a bit confused on the ancestral reconstruction process as a whole. Any help would be much appreciated. Thanks!
Hi again @alouis72 ,
I was able to get past the error I was facing earlier, but I got an error during the buildSynteny.pairwise-conservedPairs.py step saying: No such file or directory: 'ancGenes/all/ancGenes.NAME_0.list.bz2. Upon inspecting the scripts, I printed phylTree.listAncestr:
['A10', 'A11', 'A12', 'A13', 'A14', 'A15', 'A16', 'A17', 'A18', 'A19', 'A2', 'A20', 'A21', 'A22', 'A23', 'A24', 'A25', 'A26', 'A27', 'A28', 'A29', 'A3', 'A30', 'A31', 'A32', 'A33', 'A34', 'A35', 'A36', 'A37', 'A38', 'A39', 'A4', 'A40', 'A41', 'A42', 'A43', 'A44', 'A45', 'A46', 'A47', 'A48', 'A49', 'A5', 'A50', 'A51', 'A52', 'A53', 'A54', 'A55', 'A56', 'A57', 'A58', 'A59', 'A6', 'A60', 'A61', 'A62', 'A63', 'A64', 'A65', 'A66', 'A67', 'A68', 'A7', 'A8', 'A9', 'NAME_0']
Why is that last ancestor listed when it's not present in my species tree?
Hi Erin, The root of the species tree has no name, so AGORA infer it as NAME_0, but... do not have OrthoGroups for it. Either you name and give orthogroups for the root (if you have them), or you add an option "-target=A2" to the agora command line to build ancestor A2 and its descendants.
About, your first question... I don't understand how you build your OrthoGroups. Maybe there is a risk of inconsistancy between ancestors... I know that Orthofinder2 build Hierarchical Orthogroups (Phylogenetic_Hierarchical_Orthogroups in results), with consistency across the species tree. Maybe you should try that.
Great, thanks for the information. I was actually able to fix the issue prior to your response, and get it working successfully which is great.
I haven't done a deep dive into the results yet, or how to interpret them, but does Agora report the average number of genes per synteny block? Or is that something that should be done manually?
Thanks so much for your help!
Hello,
I am trying to run Agora using my own data (the example worked with no issues). This is the command I tried to run: ~/Agora/src/agora-basic.py species-tree.nwk orthologyGroups/orthologyGroups.%s.list genes/genes.%s.list
(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-(agora) [theillere@Escalante3 Single_Copy_Orthologue_Sequences]$ ~/Agora/src/agora-basic.py species-tree.nwk orthologyGroups/orthologyGroups.%s.list genes/genes.%s.list
| Key | Values |
| speciesTree | species-tree.nwk | | geneTrees|orthologyGroups | orthologyGroups/orthologyGroups.%s.list | | genes | genes/genes.%s.list | | target | | | extantSpeciesFilter | | | compress | bz2 | | workingDir | . | | nbThreads | 24 | | forceRerun | False | | sequential | True |
New task 0 ('ancgenes', 'all') [] Command(args=['/home/theillere/Agora/src/ALL.reformatGeneFamilies.py', 'species-tree.nwk', 'orthologyGroups/orthologyGroups.%s.list', '-IN.genesFiles=genes/genes.%s.list', '-OUT.ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.genesFiles=genes/genes.%s.list.bz2'], out='GeneTreeForest.withAncGenes.nhx.bz2', log='ancGenes/ancGenes.log')
New task 1 ('pairwise', 'ancgenes-all') [('ancgenes', 'all')] Command(args=['/home/theillere/Agora/src/buildSynteny.pairwise-conservedPairs.py', 'species-tree.nwk', 'NAME_0', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-genesFiles=genes/genes.%s.list.bz2', '-OUT.pairwise=pairwise/pairs-all/%s.list.bz2'], out=None, log='pairwise/pairs-all/log')
New task 2 ('integr', 'denovo-all') [('pairwise', 'ancgenes-all')] Command(args=['/home/theillere/Agora/src/buildSynteny.integr-denovo.py', 'species-tree.nwk', 'NAME_0', '+searchLoops', '-OUT.ancBlocks=ancBlocks/denovo-all/blocks.%s.list.bz2', 'pairwise/pairs-all/%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-LOG.ancGraph=ancBlocks/denovo-all/graph.%s.txt.bz2'], out=None, log='ancBlocks/denovo-all/log')
New task 3 ('integr', 'denovo-all.scaffolds') [('integr', 'denovo-all')] Command(args=['/home/theillere/Agora/src/buildSynteny.integr-scaffolds.py', 'species-tree.nwk', 'NAME_0', '-OUT.ancBlocks=ancBlocks/denovo-all.scaffolds/blocks.%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-IN.ancBlocks=ancBlocks/denovo-all/blocks.%s.list.bz2', '-genesFiles=genes/genes.%s.list.bz2', '-LOG.ancGraph=ancBlocks/denovo-all.scaffolds/graph.%s.txt.bz2'], out=None, log='ancBlocks/denovo-all.scaffolds/log')
New task 4 ('conversion', 'basic-workflow') [('integr', 'denovo-all.scaffolds')] Command(args=['/home/theillere/Agora/src/convert.ancGenomes.blocks-to-genes.py', 'species-tree.nwk', 'NAME_0', '+orderBySize', '-IN.ancBlocks=ancBlocks/denovo-all.scaffolds/blocks.%s.list.bz2', '-ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.ancGenomes=ancGenomes/basic-workflow/ancGenome.%s.list.bz2'], out=None, log='ancGenomes/basic-workflow/log')
Status: 5 to do, 0 running, 0 done, 0 failed -- 5 total Available tasks: [0] Control file ancGenes/ancGenes.log.agora missing Launching task 0 ['/home/theillere/Agora/src/ALL.reformatGeneFamilies.py', 'species-tree.nwk', 'orthologyGroups/orthologyGroups.%s.list', '-IN.genesFiles=genes/genes.%s.list', '-OUT.ancGenesFiles=ancGenes/all/ancGenes.%s.list.bz2', '-OUT.genesFiles=genes/genes.%s.list.bz2'] > GeneTreeForest.withAncGenes.nhx.bz2 2> ancGenes/ancGenes.log Status: 4 to do, 1 running, 0 done, 0 failed -- 5 total Waiting ... task 0 report: 0.106603 sec CPU time / 0.107803 sec elapsed = 98.8865% CPU usage, 17.625 MB RAM task 0 is now finished (status 1)
Here is the input data that I'm working with: https://www.dropbox.com/scl/fo/en4rlnwvvnspv9sj51d3u/h?dl=0&rlkey=ybt2vi7hi09xfgnp2uuw85oz7
Please let me know if you have any insight as to how I can solve this issue. I'm also attaching the log file. Thanks!
Agora_Log.txt