HobnobMancer / cazy_webscraper

Web scraper to retrieve protein data catalogued by the CAZy, UniProt, NCBI, GTDB and PDB websites/databases.
https://hobnobmancer.github.io/cazy_webscraper/
MIT License
13 stars 3 forks source link

Reduced memory demand #123

Open HobnobMancer opened 5 months ago

HobnobMancer commented 5 months ago

Extremely large CAZy database releases is increasing the memory intensity of cazy_webscraper, making it difficult to run the tool on standard office equipment (see issue at cazomevovle).

The issue comes down to parsing the large CAZy text file (the CAZy db dump), and passing the data into a dict, which is memory intensive. This method needs to be changed, and broken up to reduce the computational load - ideally to below 8GB.

HobnobMancer commented 3 months ago

Plans for development: