cruizperez / MicrobeAnnotator

Pipeline for metabolic annotation of microbial genomes
Artistic License 2.0
139 stars 27 forks source link

Database download error on step 11 #86

Open Sidduppal opened 1 year ago

Sidduppal commented 1 year ago

Hey, I was able to successfully download all database files till step 10. I'm getting the following error on step 11. Any help will be appreciated. Thanks

2023-08-15 07:49:29,220 [INFO]: Step 11
2023-08-15 07:49:29,227 [INFO]: Processing GenBank files
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented eqgl feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented 61 feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:305: BiopythonParserWarning: Non-standard feature line wrapping (didn't break on comma)?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented Region feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176966. Is it malformed?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176967. Is it malformed?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176968. Is it malformed?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176969. Is it malformed?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176970. Is it malformed?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176971. Is it malformed?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented Region feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented COMPLETENESS: feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented source feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:190: BiopythonParserWarning: line too short to contain a feature: 'cetw7GA'
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented . feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented skla feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented invd62d(3 feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:190: BiopythonParserWarning: line too short to contain a feature: '    70'
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:215: BiopythonParserWarning: Over indented o6f feature?
  BiopythonParserWarning,
/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line
  "Invalid indentation for sequence line", BiopythonParserWarning
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/microbeannotator/database/refseq_genbank_parser.py", line 57, in table_creator
    for record in SeqIO.parse(uncompressed_genbank, "genbank"):
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 74, in __next__
    return next(self.records)
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 516, in parse_records
    record = self.parse(handle, do_features)
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 499, in parse
    if self.feed(handle, consumer, do_features):
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 475, in feed
    misc_lines, sequence_string = self.parse_footer()
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 1240, in parse_footer
    raise ValueError("Sequence line mal-formed, '%s'" % line)
ValueError: Sequence line mal-formed, '     dhyltransferases'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/sidd/miniconda3/envs/microbeannotator/bin/microbeannotator_db_builder", line 445, in <module>
    main()
  File "/home/sidd/miniconda3/envs/microbeannotator/bin/microbeannotator_db_builder", line 437, in main
    single_step, aspera, keep_temp, bin_path)
  File "/home/sidd/miniconda3/envs/microbeannotator/bin/microbeannotator_db_builder", line 213, in database_duilder
    temp_table_list = genbank.table_generator_worker(genbank_files, threads)
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/microbeannotator/database/refseq_genbank_parser.py", line 112, in table_generator_worker
    temp_table_list = pool.map(table_creator, genbank_list)
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/sidd/miniconda3/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: Sequence line mal-formed, '     dhyltransferases'
hmamine commented 9 months ago

I am having the same issue, it stops indefinitely at step 11 Processing GenBank files /home/User/miniconda3/envs/Microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1794: BiopythonParserWarning: Structured comment not parsed for YP_009176966. Is it malformed? BiopythonParserWarning, /home/User/miniconda3/envs/Microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1794: BiopythonParserWarning: Structured comment not parsed for YP_009176967. Is it malformed? BiopythonParserWarning,

DNADoubleFelix commented 8 months ago

Has anyone managed to move passed this issue on step 11? I am also getting the "Structured comment not parsed" error for the entries YP66 to YP71, just like @hmamine .

andriangajigan commented 6 months ago

Hi all,

I experience this error as well:

2024-03-04 05:30:44,430 [INFO]: Step 11 2024-03-04 05:30:44,430 [INFO]: Processing GenBank files /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176966. Is it malformed? BiopythonParserWarning, /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176967. Is it malformed? BiopythonParserWarning, /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176968. Is it malformed? BiopythonParserWarning, /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176969. Is it malformed? BiopythonParserWarning, /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176970. Is it malformed? BiopythonParserWarning, /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1797: BiopythonParserWarning: Structured comment not parsed for YP_009176971. Is it malformed? BiopythonParserWarning, /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1236: BiopythonParserWarning: Invalid indentation for sequence line "Invalid indentation for sequence line", BiopythonParserWarning /home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1225: BiopythonParserWarning: Blank line in sequence data warnings.warn("Blank line in sequence data", BiopythonParserWarning) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/microbeannotator/database/refseq_genbank_parser.py", line 57, in table_creator for record in SeqIO.parse(uncompressed_genbank, "genbank"): File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 74, in next return next(self.records) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 516, in parse_records record = self.parse(handle, do_features) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 499, in parse if self.feed(handle, consumer, do_features): File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 466, in feed self._feed_header_lines(consumer, self.parse_header()) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 126, in parse_header line = self.handle.readline() File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/gzip.py", line 300, in read1 return self._buffer.read1(size) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/gzip.py", line 482, in read uncompress = self._decompressor.decompress(buf, size) zlib.error: Error -3 while decompressing data: invalid code lengths set """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/agajigan/.conda/envs/microbeannotator/bin/microbeannotator_db_builder", line 445, in main() File "/home/agajigan/.conda/envs/microbeannotator/bin/microbeannotator_db_builder", line 437, in main single_step, aspera, keep_temp, bin_path) File "/home/agajigan/.conda/envs/microbeannotator/bin/microbeannotator_db_builder", line 213, in database_duilder temp_table_list = genbank.table_generator_worker(genbank_files, threads) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/site-packages/microbeannotator/database/refseq_genbank_parser.py", line 112, in table_generator_worker temp_table_list = pool.map(table_creator, genbank_list) File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/agajigan/.conda/envs/microbeannotator/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value zlib.error: Error -3 while decompressing data: invalid code lengths set