Reduced memory demand - Githubissues

HobnobMancer / cazy_webscraper

Web scraper to retrieve protein data catalogued by the CAZy, UniProt, NCBI, GTDB and PDB websites/databases.

https://hobnobmancer.github.io/cazy_webscraper/

MIT License

13 stars 3 forks source link

Reduced memory demand #123

Open HobnobMancer opened 5 months ago

HobnobMancer commented 5 months ago

Extremely large CAZy database releases is increasing the memory intensity of cazy_webscraper, making it difficult to run the tool on standard office equipment (see issue at cazomevovle).

The issue comes down to parsing the large CAZy text file (the CAZy db dump), and passing the data into a dict, which is memory intensive. This method needs to be changed, and broken up to reduce the computational load - ideally to below 8GB.

HobnobMancer commented 3 months ago

Plans for development:

Don't load the entire CAZy database dump into memory all at once
Using temp/mounting tables to reduce the amount of data that is stored in the memory
Transition from sqlalchemy to sqlite3