dr-joe-wirth / phantasm

PHANTASM: PHylogenomic ANalyses for the TAxonomy and Systematics of Microbes
MIT License
23 stars 0 forks source link

Error for create makeSpeciesTreeWorkDir #11

Closed luisruis closed 8 months ago

luisruis commented 1 year ago

Hello Dr. Joe,

It's me, again. I already managed to run Option 3: known reference genomes. The program was analyzing and comparing my genomes these days, but today, almost to finish the analysis, the following error appeared:

Sin título

dr-joe-wirth commented 1 year ago

could you please share the log file with me? my guess is that your genomes are very distantly related and the software was unable to calculate a core genome. To verify this, there should be a file called makeSpeciesTreeWorkDir/aabrhHardCore_concatenated.afa. If this file is empty, then this is your problem.

luisruis commented 1 year ago

the phantasm.log file has this content:

INFO:main:/phantasm/phantasm analyzeGenomes -i Refine_Genome_Caulobacter -m human_map.txt -e lxxxxxxxxx@cxxxxxxxx.mx INFO:main:v1.1.0 INFO:main:num cpus: 1 INFO:main:reduce num core: False INFO:main:bootstrap tree: False INFO:main:num bootstraps: 0

INFO:main:start analyzeGenomes

INFO:PHANTASM.coreGenes.parseGenbank:Parsing genbank files ... INFO:PHANTASM.coreGenes.parseGenbank:Done.

INFO:PHANTASM.coreGenes.allVsAllBlast:Running all pairwise blastp comparisons ... INFO:PHANTASM.coreGenes.allVsAllBlast:Done.

INFO:PHANTASM.coreGenes.calculateCoreGenes:Calculating core genes ... INFO:PHANTASM.coreGenes.calculateCoreGenes:Done.

INFO:PHANTASM.coreGenes.makeSpeciesTree:Aligning core genes ... INFO:PHANTASM.coreGenes.makeSpeciesTree:Done.

the aabrhHardCore_concatenated.afa file if it is empty. The genomes that I want to analyze are from the same bacterial genus, only the outgroup is a genome from a strain of a different genus. In this case, would Phantasm have problems analyzing genomes of the same taxonomic genus?

dr-joe-wirth commented 1 year ago

are you able to share your genomes with me? Either your outgroup is too distantly related (should not be the case if they're in the same taxonomic family), or one or more of your genomes is not annotated properly. If it was the latter, then this could be why no core genes were detected. PHANTASM relies on annotations in order to extract the coding sequences from your genomes.

luisruis commented 1 year ago

Hello Dr. Joe. Does my problem have a solution? :(

dr-joe-wirth commented 1 year ago

Hello Luis,

I have not had a chance to investigate this problem yet. I will get back to you by the end of the week.

dr-joe-wirth commented 1 year ago

A quick glance at your files reveals that several of your genomes are improperly formatted:

* Caulobacter_sp_AfrMine_TT107_68_72.gbff
* Caulobacter_vibrioides_UBA2596.gbff
* Caulobacter_sp_JGI_0001010-J14.gbff
* Caulobacter_vibrioides_GCA_951805235.gbff
* Caulobacter_rhizosphaerae_KCTC_52515.gbff

The warnings you received was BioPython telling you that something is wrong with those files. This is likely why your run failed. Remove those genomes and try again.

luisruis commented 1 year ago

Hello Dr. Joe,

I already removed the genomes that biopython identified as bad annotations, but I still don't get the final files. Phantasm generates all the blast and fasta files that are obtained by comparing all the genomes with each other, but the aabrhHardCore.out file is empty. The aabrhHardCore_concatenated.afa and aabrhHardCoreFamToGeneKey.txt files that are generated in the makeSpeciesTreeworDir folder are also empty. What changes should I make?

I put the image of what appears to me after the 10,404 blast files and the 816 fasta of the genomes are generated. In total I am analyzing 102 genome annotations.

error_phantasm

dr-joe-wirth commented 1 year ago

As before, please upload the log file. If aabrhHardCore.out is empty, then xenoGI (the software package used to calculate core genes) is not finding any core genes.

It is difficult to pinpoint exactly which genome(s) is causing the problem without examining the blastp tables. I recommend checking those files to find out which genome(s) lacks good hits to any other genome(s). The cut-offs are described in the publication.

In the meantime, I will reopen this issue handle this error in a way that makes it clear the set of specified genomes are incompatible with PHANTASM.