carolzhou / multiPhATE2

multiPhATE with comparative genomics
18 stars 10 forks source link

Error: NCBI C++ Exception: #31

Closed smeeta closed 3 years ago

smeeta commented 3 years ago

Dear Carol,

I had optimised and the scripts were working fine. Today when I am trying with a new set of phage genomes. I get the following error. Please help.

phate_blast says, Running VOG Gene blast: /Users/smeeta/myMultiphateDir/multiPhATE2-master/Databases/VOGs/vog.genes.tagged.all.fa vogGene Error: NCBI C++ Exception: T0 "/Users/distiller/project/miniconda/conda-bld/blast_1615552848443/work/blast/c++/src/corelib/ncbiobj.cpp", line 981: Critical: (CCoreException::eNullPtr) ncbi::CObject::ThrowNullPointerException() - Attempt to access NULL pointer.

Traceback (most recent call last): File "/Users/smeeta/myMultiphateDir/multiPhATE2-master/SequenceAnnotation/phate_sequenceAnnotation_main.py", line 1247, in blast.runBlast(myGenome.geneSet,'gene') File "/Users/smeeta/myMultiphateDir/multiPhATE2-master/SequenceAnnotation/phate_blast.py", line 700, in runBlast self.blast1fasta(fasta,outfile,database,dbName) File "/Users/smeeta/myMultiphateDir/multiPhATE2-master/SequenceAnnotation/phate_blast.py", line 471, in blast1fasta tree.parse(outfile) File "/opt/anaconda3/envs/multiphate2/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: no element found: line 1, column 0 phate_runPipeline says, Sequence annotation processing is complete. phate_runPipeline says, Code completed at 2021-05-25 22:14:11.271182 multiPhate says, Skipping CompareGeneProfiles. multiPhate says, Skipping Genomics.

regards smeeta

carolzhou commented 3 years ago

Are you using parallelism? I was getting this error when there was clashing between processes, but that was fixed, or so I thought, as I had not seen the error since.

smeeta commented 3 years ago

No, I am not using parallelism parameter.

carolzhou commented 3 years ago

Let me know when you had downloaded the code.

smeeta commented 3 years ago

I downloaded it latest on : 2nd April 2021.

carolzhou commented 3 years ago

Hmm. Then there must be a bug I had not encountered. Please won't you send me your input file(s), by email: multiphate@gmail.com. In the meantime, please try running without parallelism.

carolzhou commented 3 years ago

and your configuration file too, please

smeeta commented 3 years ago

Yes, I have emailed all required documents. thank you so much :) super appreciate it !

smeeta commented 3 years ago

Hi Carol, Quick question to run case 2 -

ncbi_virus_genome_blast='true' vog_gene_blast='true' phmmer='true' swissprot_blast='true' cazy_blast='true' ncbi_virus_genome_database_path='/full_path_to_database/ncbiVirusGenomes.fasta' vog_gene_database_path='/full_path_to_database/vog.genes.tagged.all.fa' cazy_database_path='/full_path_to_database/CAZyDB.07312019.fa' cazy_annotation_path='/full_path_to_database/CAZyDB.07312019.fam-activities.txt'

Do I need to download all databases ie

pVOGhmm and VOGhmms also ?

Thanks Smeeta

carolzhou commented 3 years ago

You only need to download the databases you are going to use. Your previous run showed an error complaining about not locating one of the .tsv data files. The .tsv annotation files should be part of the vog data downloads. Unless the source has made changes I am not aware of.

On Fri, May 28, 2021 at 7:02 AM smeeta @.***> wrote:

Hi Carol, Quick question to run case 2 -

ncbi_virus_genome_blast='true' vog_gene_blast='true' phmmer='true' swissprot_blast='true' cazy_blast='true'

ncbi_virus_genome_database_path='/full_path_to_database/ncbiVirusGenomes.fasta' vog_gene_database_path='/full_path_to_database/vog.genes.tagged.all.fa' cazy_database_path='/full_path_to_database/CAZyDB.07312019.fa'

cazy_annotation_path='/full_path_to_database/CAZyDB.07312019.fam-activities.txt'

Do I need to download all databases ie

pVOGhmm and VOGhmms also ?

Thanks Smeeta

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/carolzhou/multiPhATE2/issues/31#issuecomment-850441054, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWGOOZNVDQ7AUZB6PLSOB3TP6O6LANCNFSM45PSZYYA .

smeeta commented 3 years ago

Hi,

Please note I already have the file in my path ie /Users/imac/multiPhATE2/Databases/VOGs. These are the files present :

-rw-r--r-- 1 imac staff 1199342 May 15 2020 vog.annotations.tsv -rw-r--r-- 1 imac staff 9655941 Jun 1 13:06 vog.gene.headers.lst -rw-r--r-- 1 imac staff 274748434 May 15 2020 vog.genes.all.fa -rw-r--r-- 1 imac staff 227299521 May 30 03:39 vog.genes.tagged.all.fa -rw-r--r-- 1 imac staff 28430435 May 30 07:36 vog.genes.tagged.all.fa.nhr -rw-r--r-- 1 imac staff 3582452 May 30 07:36 vog.genes.tagged.all.fa.nin -rw-r--r-- 1 imac staff 54534111 May 30 07:36 vog.genes.tagged.all.fa.nsq -rw-r--r-- 1 imac staff 7197916 May 15 2020 vog.members.tsv -rw-r--r-- 1 imac staff 95695921 May 15 2020 vog.proteins.all.fa -rw-r--r-- 1 imac staff 82147548 May 30 07:35 vog.proteins.tagged.all.fa -rw-r--r-- 1 imac staff 28430530 May 30 07:36 vog.proteins.tagged.all.fa.phr -rw-r--r-- 1 imac staff 2388336 May 30 07:36 vog.proteins.tagged.all.fa.pin -rw-r--r-- 1 imac staff 72491576 May 30 07:36 vog.proteins.tagged.all.fa.psq -rw-r--r-- 1 imac staff 264 May 15 2020 vog_functional_categories.txt

smeeta commented 3 years ago

this is the error the script is showing :

phate_blast says, Running VOG Gene blast: /Users/imac/multiPhATE2/Databases/VOGs/vog.genes.tagged.all.fa vogGene Traceback (most recent call last): File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_sequenceAnnotation_main.py", line 1247, in blast.runBlast(myGenome.geneSet,'gene') File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_blast.py", line 700, in runBlast self.blast1fasta(fasta,outfile,database,dbName) File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_blast.py", line 580, in blast1fasta newAnnotation.link2databaseIdentifiers(database,dbName) # Get DBXREFs, packed into self.description File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_annotation.py", line 431, in link2databaseIdentifiers VOGlist = self.getVOGmembers(vogAnnotationFile,'vog') File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_annotation.py", line 250, in getVOGmembers vogAnnotation = self.findVOGannotation(vogID,database) File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_annotation.py", line 279, in findVOGannotation database_h = open(VOG_ANNOTATION_FILE,'r') FileNotFoundError: [Errno 2] No such file or directory: '/vog.annotations.tsv'

but as mentioned I already have the file as shown above . File is

-rw-r--r-- 1 imac staff 1199342 May 15 2020 vog.annotations.tsv

thank you Smeeta

carolzhou commented 3 years ago

Please verify that your configuration file specifies the full path/filename to all of your data files: vog_gene_database_path='/full_path_to_database/vog.genes.tagged.all.fa' So, "full_path_to_database" needs to be replaced with something like, "/Users/imac/multiPhATE2/Databases/VOG"

smeeta commented 3 years ago

Yes I use full like given below :

/Users/imac/multiPhATE2/Databases/VOGs/ vog.genes.tagged.all.fa

But I did as u suggested and went to put this - /Users/imac/multiPhATE2/Databases/VOGs/

Error -

phate_blast says, Running VOG Gene blast: /Users/imac/multiPhATE2/Databases/VOGs/ vogGene BLAST Database error: No alias or index file found for nucleotide database [/Users/imac/multiPhATE2/Databases/VOGs/] in search path [/Users/imac/multiPhATE2::] Traceback (most recent call last): File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_sequenceAnnotation_main.py", line 1247, in blast.runBlast(myGenome.geneSet,'gene') File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_blast.py", line 700, in runBlast self.blast1fasta(fasta,outfile,database,dbName) File "/Users/imac/multiPhATE2/SequenceAnnotation/phate_blast.py", line 471, in blast1fasta tree.parse(outfile) File "/Users/imac/opt/anaconda3/envs/multiphate2/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse self._root = parser._parse_whole(source) xml.etree.ElementTree.ParseError: no element found: line 1, column 0

thank you Smeeta

carolzhou commented 3 years ago

Please post or send me your complete configuration file.

smeeta commented 3 years ago

Hi Carol,

I managed to run the script. I removed the "Ref" dataset from the analysis/.config file.

Quick question though, Do you know any pipeline which can be used to graph the output from multiphate2 .

Any help would be greatly appreciated. Thanks a mil for the total support.

regards Smeeta

carolzhou commented 3 years ago

Hi Smeeta, You should be able to import the phare_sequenceAnnotation_main.gff file into some third part codes. Its a bit tricky with standard formats; I have attempted to follow the GFF3 specifications, but there are different interpretations, so I cannot guarantee that every third-party code is consistent. But you might try Artemis/ACT. There are a number of other available codes for viewing genes on genomes and comparing across genomes, which I have not used. Are you good to go with multiphate? Shall we close this issue? Let me know if you have any other difficulties. -Carol

On Fri, Jun 4, 2021 at 1:40 AM smeeta @.***> wrote:

Hi Carol,

I managed to run the script. I removed the "Ref" dataset from the analysis/.config file.

Quick question though, Do you know any pipeline which can be used to graph the output from multiphate2 .

Any help would be greatly appreciated. Thanks a mil for the total support.

regards Smeeta

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/carolzhou/multiPhATE2/issues/31#issuecomment-854486757, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACWGOO4CPK7UZ44L6H2IGYDTRCGPXANCNFSM45PSZYYA .