cruizperez / MicrobeAnnotator

Pipeline for metabolic annotation of microbial genomes
Artistic License 2.0
139 stars 27 forks source link

DB build error on step 11 #24

Open mariap3636 opened 3 years ago

mariap3636 commented 3 years ago

Hi, may you help me to find a way to work around these possible issues? 1)an issue related to /software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 368 and ValueError: Problem with 'source' feature: 1..74 /organism="Psf dif8rs

2)an issue related to File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/microbeannotator/database/conversion_database_creator.py", line 82, in create_refseq_to_uniprot 'CREATE INDEX refseq_index ON refseq_to_uniprot (refseq_id)') sqlite3.OperationalError: disk I/O error

both issues raised when running the following statement: microbeannotator_db_builder -d MicrobeAnnotator_DB -m diamond -t 8 --step 11 --no_aspera

at the end I have only the following tables: ls -ltr ./MicrobeAnnotator_DB/*table -rw-r--r-- 1 mariap3636 meta 193408369 Jun 3 18:55 ./MicrobeAnnotator_DB/uniprot_swissprot.table -rw-r--r-- 1 mariap3636 meta 56168238388 Jun 5 00:08 ./MicrobeAnnotator_DB/uniprot_trembl.table and this is the final output (BUSTER)mariap3636@skirit:~$ ls -ltr ./MicrobeAnnotator_DB/ total 60148256 drwxr-xr-x 3 mariap3636 meta 4096 Jun 3 18:48 kofam_data drwxr-xr-x 2 mariap3636 meta 4096 Jun 3 18:53 temp_swissprot_dat_files -rw-r--r-- 1 mariap3636 meta 193408369 Jun 3 18:55 uniprot_swissprot.table drwxr-xr-x 2 mariap3636 meta 4096 Jun 4 01:46 temp_trembl_dat_files -rw-r--r-- 1 mariap3636 meta 2302704979 Jun 5 00:08 uniprot_to_refseq.txt -rw-r--r-- 1 mariap3636 meta 56168238388 Jun 5 00:08 uniprot_trembl.table -rw------- 1 mariap3636 meta 0 Jun 22 14:25 microbeannotator.db drwxr-xr-x 2 mariap3636 meta 4096 Jun 22 16:32 protein_db drwx------ 2 mariap3636 meta 524288 Jun 22 23:31 temp_genbank<- this is full -rw------- 1 mariap3636 meta 2926403584 Jun 23 19:56 conversion.db

here I copy-paste what seems to be relevant in the error message I received (I have attached the complete error message file) Traceback (most recent call last): File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/microbeannotator/database/refseq_genbank_parser.py", line 57, in table_creator for record in SeqIO.parse(uncompressed_genbank, "genbank"): File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 73, in next return next(self.records) File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 516, in parse_records record = self.parse(handle, do_features) File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 499, in parse if self.feed(handle, consumer, do_features): File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 470, in feed self._feed_feature_table(consumer, self.parse_features(skip=False)) File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 230, in parse_features features.append(self.parse_feature(feature_key, feature_lines)) File "/software/conda-modules/5.3.1/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 368, in parse_feature ) from None ValueError: Problem with 'source' feature: 1..74 /organism="Psf dif8rs """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/software/conda-modules/5.3.1/envs/microbeannotator/bin/microbeannotator_db_builder", line 445, in main()

Thanks a lot for any feedback.

Maria microbeAnnotatorLastSteps.txt

silvtal commented 2 years ago

I think this happens when the genbank files are corrupted because they haven't been downloaded correctly. I made a custom loop that re-downloads individual temporal files whenever an error occurs, and that fixed it for me