Please complete this report in full and as much detail as possible. It will help with getting the bug fixed far sooner!
Describe the bug
While retrieving protein sequences from NCBI, if the Bio.Entrez NotXMLError is raised, the tool crashes and does not retrieve any of the remaining protein sequences.
To Reproduce
Please include the specific steps (including all code) you performed, so that we can check if the behaviour can be reproduced:
Traceback (most recent call last):
File "/home/user/anaconda3/.../cw_get_genbank_seqs", line 33, in <module>
sys.exit(load_entry_point('cazy-webscraper', 'console_scripts', 'cw_get_genbank_seqs')())
File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 160, in main
seq_dict, no_seq = get_sequences(genbank_accessions, args) # {gbk_accession: seq}
File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 297, in get_sequences
seq_dict, success_accessions, failed_accessions = retry_failed_queries(
File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 366, in retry_failed_queries
new_seq_dict, no_seq = get_sequences(query, args, retry=True)
File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 223, in get_sequences
epost_webenv, epost_query_key = bulk_query_ncbi(query_list, args)
File "/home/user/.../cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 337, in bulk_query_ncbi
epost_result = Entrez.read(
File "/home/user/anaconda3/.../Bio/Entrez/__init__.py", line 508, in read
record = handler.read(handle)
File "/home/user/anaconda3/.../Bio/Entrez/Parser.py", line 345, in read
raise NotXMLError(e) from None
Bio.Entrez.Parser.NotXMLError: Failed to parse the XML data (no element found: line 1, column 0). Please make sure that the input data are in XML format.
Expected behavior
cazy_webscrapershould be able to handle this error and continue on retrieving the rest of protein sequences.
Please complete this report in full and as much detail as possible. It will help with getting the bug fixed far sooner!
Describe the bug
While retrieving protein sequences from NCBI, if the Bio.Entrez
NotXMLError
is raised, the tool crashes and does not retrieve any of the remaining protein sequences.To Reproduce
Please include the specific steps (including all code) you performed, so that we can check if the behaviour can be reproduced:
Command:
cw_get_genbank_seqs all_cazy_2022-08-22.db <email> --families GH50
Error:
Expected behavior
cazy_webscraper
should be able to handle this error and continue on retrieving the rest of protein sequences.