Bio.Entrez NotXMLError - Githubissues

Please complete this report in full and as much detail as possible. It will help with getting the bug fixed far sooner!

Describe the bug

While retrieving protein sequences from NCBI, if the Bio.Entrez NotXMLError is raised, the tool crashes and does not retrieve any of the remaining protein sequences.

To Reproduce

Please include the specific steps (including all code) you performed, so that we can check if the behaviour can be reproduced:

Command: cw_get_genbank_seqs all_cazy_2022-08-22.db <email> --families GH50

Error:

Traceback (most recent call last):
  File "/home/user/anaconda3/.../cw_get_genbank_seqs", line 33, in <module>
    sys.exit(load_entry_point('cazy-webscraper', 'console_scripts', 'cw_get_genbank_seqs')())
  File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 160, in main
    seq_dict, no_seq = get_sequences(genbank_accessions, args)  # {gbk_accession: seq}
  File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 297, in get_sequences
    seq_dict, success_accessions, failed_accessions = retry_failed_queries(
  File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 366, in retry_failed_queries
    new_seq_dict, no_seq = get_sequences(query, args, retry=True)
  File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 223, in get_sequences
    epost_webenv, epost_query_key = bulk_query_ncbi(query_list, args)
  File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 337, in bulk_query_ncbi
    epost_result = Entrez.read(
  File "/home/user/anaconda3/.../Bio/Entrez/__init__.py", line 508, in read
    record = handler.read(handle)
  File "/home/user/anaconda3/.../Bio/Entrez/Parser.py", line 345, in read
    raise NotXMLError(e) from None
Bio.Entrez.Parser.NotXMLError: Failed to parse the XML data (no element found: line 1, column 0). Please make sure that the input data are in XML format.

Expected behavior

cazy_webscrapershould be able to handle this error and continue on retrieving the rest of protein sequences.

HobnobMancer / cazy_webscraper

Bio.Entrez NotXMLError #95

Describe the bug

To Reproduce

Expected behavior