HobnobMancer / cazy_webscraper

Web scraper to retrieve protein data catalogued by the CAZy, UniProt, NCBI, GTDB and PDB websites/databases.
https://hobnobmancer.github.io/cazy_webscraper/
MIT License
13 stars 3 forks source link

Incorrect parsing of NCBI protein version accession #129

Open HobnobMancer opened 1 month ago

HobnobMancer commented 1 month ago

cazy_webscraper is not identifying the NCBI protein version accessions correctly, and is unable to pair up the downloaded data with data in the local CAZyme database.

Traceback (most recent call last):
  File "/cazy_env/bin/cw_get_genbank_seqs", line 8, in <module>
    sys.exit(main())
  File "/cazy_env/lib/python3.6/site-packages/cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 164, in main
    args,
  File "/cazy_env/lib/python3.6/site-packages/cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 321, in get_seqs_from_ncbi
    args,
  File "/cazy_env/lib/python3.6/site-packages/cazy_webscraper/expand/genbank/sequences/get_genbank_sequences.py", line 534, in parse_failed_connections
    failed_connections_batches["_".join(batch)]
KeyError: 'U_U_A_0_5_2_9_4_._1'
HobnobMancer commented 2 days ago

I am still working on this. I hope have working code on the issues-129-ncbi branch but next Monday :crossed_fingers: