carolzhou / multiPhATE2

multiPhATE with comparative genomics
18 stars 10 forks source link

I think the non-alphanumeric characters in your fasta file header are causing the failure. If the above fix does not work, then feel free to re-open this issue. Thank you for your interest in multiPhATE2. #30

Closed smeeta closed 3 years ago

smeeta commented 3 years ago

I think the non-alphanumeric characters in your fasta file header are causing the failure. If the above fix does not work, then feel free to re-open this issue. Thank you for your interest in multiPhATE2.

Originally posted by @carolzhou in https://github.com/carolzhou/multiPhATE2/issues/29#issuecomment-819634209

smeeta commented 3 years ago

phate_sequenceAnnotation_main says, Writing genes file phate_sequenceAnnotation_main says, Writing peptides file phate_sequenceAnnotation_main says, Gene and protein files created. phate_sequenceAnnotation_main says, Preparing for blast... phate_sequenceAnnotation_main says, Preparing to run blastp phate_sequenceAnnotation_main says, Skipping genome blast phate_sequenceAnnotation_main says, Skipping gene blast. phate_sequenceAnnotation_main says, Preparing for protein blast... phate_sequenceAnnotation_main says, Running blastp against protein database(s)... phate_blast says, Running PHANTOME blast: /Users/smeeta/myMultiphateDir/multiPhATE2-master/Databases/Phantome/Phantome_Phage_genes.faa phantome ** BLAST Database error: Error pre-fetching sequence data Traceback (most recent call last): File "/Users/smeeta/myMultiphateDir/multiPhATE2-master/SequenceAnnotation/phate_sequenceAnnotation_main.py", line 1313, in blast.runBlast(myGenome.proteinSet,'protein') File "/Users/smeeta/myMultiphateDir/multiPhATE2-master/SequenceAnnotation/phate_blast.py", line 790, in runBlast self.blast1fasta(fasta,outfile,database,dbName) File "/Users/smeeta/myMultiphateDir/multiPhATE2-master/SequenceAnnotation/phate_blast.py", line 471, in blast1fasta tree.parse(outfile) File "/opt/anaconda3/envs/multiphate2/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: no element found: line 1, column 0 phate_runPipeline says, Sequence annotation processing is complete. phate_runPipeline says, Code completed at 2021-04-15 10:55:42.963452 phate_runPipeline says, Checking files... phate_runPipeline says, Configuration complete. phate_runPipeline says, Preparing to run genecall module... phate_runPipeline says, Calling the gene-call module. phate_genecallPhage says, running Phanotate.

out put file :

(multiphate2) smeetas-MacBook-Air:clean_K1ind1 smeeta$ ls -l total 400 drwxr-xr-x 3 smeeta staff 96 Apr 15 10:55 BLAST -rw-r--r-- 1 smeeta staff 20 Apr 15 10:55 CGC_parser.log -rw-r--r-- 1 smeeta staff 0 Apr 15 10:55 CGC_parser.out -rw-r--r-- 1 smeeta staff 0 Apr 15 10:55 CGC_parser.tmp -rw-r--r-- 1 smeeta staff 1823 Apr 15 10:54 clean_K1ind1.json -rw-r--r-- 1 smeeta staff 42556 Apr 15 10:55 clean_K1ind1_cgp_gene.fnt -rw-r--r-- 1 smeeta staff 14809 Apr 15 10:55 clean_K1ind1_cgp_protein.faa -rw-r--r-- 1 smeeta staff 46937 Apr 15 10:55 gene.fnt -rw-r--r-- 1 smeeta staff 6642 Apr 15 10:55 phanotate.cgc -rw-r--r-- 1 smeeta staff 7884 Apr 15 10:55 phanotateOutput.txt -rw-r--r-- 1 smeeta staff 2910 Apr 15 10:55 phate_genecallPhage.log -rw-r--r-- 1 smeeta staff 6695 Apr 15 10:55 phate_phanotate.gff -rw-r--r-- 1 smeeta staff 0 Apr 15 10:55 phate_sequenceAnnotation_main.gff -rw-r--r-- 1 smeeta staff 4244 Apr 15 10:55 phate_sequenceAnnotation_main.log -rw-r--r-- 1 smeeta staff 0 Apr 15 10:55 phate_sequenceAnnotation_main.out -rw-r--r-- 1 smeeta staff 19617 Apr 15 10:55 protein.faa -rw-r--r-- 1 smeeta staff 6562 Apr 15 10:55 results.txt -rw-r--r-- 1 smeeta staff 4506 Apr 15 10:55 runPhATE.log -rw-r--r-- 1 smeeta staff 632 Apr 15 10:55 tempGeneFile -rw-r--r-- 1 smeeta staff 267 Apr 15 10:55 tempProtFile -rw-r--r-- 1 smeeta staff 0 Apr 15 10:55 trnaGenes.out -rw-r--r-- 1 smeeta staff 3940 Apr 15 10:55 trnaStatistics.out -rw-r--r-- 1 smeeta staff 0 Apr 15 10:55 trnaStructures.out (multiphate2) smeetas-MacBook-Air:clean_K1ind1 smeeta$

smeeta commented 3 years ago

why do I get this error - BLAST Database error: Error pre-fetching sequence data

carolzhou commented 3 years ago

This might be due to Blast's having changed their format with version 2.11 and beyond. Check your conda installation of blast. If it is 2.11+, then I suggest re-installing a previous version (e.g., 2.9) and see if that fixes the problem. In the meantime I will need to modify code to accommodate both formatting schemes; this will take some time. To check which version of blast you have: $ conda list To back-install to version 2.9: $ conda install blast=2.9

carolzhou commented 3 years ago

I did a test run with your genome, running Prodigal, and using blastp against the Phantome database (conda-installed blast v.11), and it worked, giving output in the phate_sequenceAnnotation_main.gff/out files. Please post your configuration file, just to double check that, and so I can fully replicate your multiPhATE2 run.

smeeta commented 3 years ago

Hi Carol, It is running now. Will update soon.

carolzhou commented 3 years ago

Glad to hear it. Let me know how it goes.

On Thu, Apr 15, 2021 at 11:15 PM smeeta @.***> wrote:

Hi Carol, It is running now. Will update soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/carolzhou/multiPhATE2/issues/30#issuecomment-820936572, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWGOO7LZQ6D3CRWUR6KAC3TI7IXPANCNFSM426R7NEA .

smeeta commented 3 years ago

Hello Carol, I got the following for phate_sequenceAnnotation_main.gff.

Once again the annotation is really poor. Please see attached.

phate_sequenceAnnotation_main.xlsx

carolzhou commented 3 years ago

I beg to differ. The annotations look quite as expected. Most of the phage genes are identified. There are some without annotations, but this is not surprising.

smeeta commented 3 years ago

Apologies Carol, guess I am used to human gene annotation which shows something like this :

PostAnotation_HumanGeneAnnotation

I fully understand that phage are less studied also highly heterogeneous hence we do not see annotations like humans.

thank you so much. regards Smeeta