Closed PeterMBlack closed 3 months ago
Hi!
This is an issue with cazy_webscraper
so I've raised an issue over there.
Initial checks indicate this is due to a CAZy database release. The latest release generates a file of >4 million lines to be parsed to identify all unique proteins (identified by their NCBI protein version accession). I'll take a look at the cazy_webscraper
code base and attempt to resource optimisation.
For progress check issue 123 at cazy_webscraper
; I'll post here once I think the issue has been addressed.
This should be fixed with using the latest version of cazy_webscraper
--> see the relevant issue in cazy_webscraper
.
Tried generating a local CAZy db locally on my mac (8GB RAM), rather than on a cluster with more memory. Half way through parsing the CAZy text file (51%) it maxes out my memory. Usage statistics suggested it was using 16GB memory (split between my acutal RAM and virtual memory)